I’m trying to serialize a java string into an array of bytes and then deserialize the array into a string again. It seemed to work OK until I tested of a the unicode character \ude4e. For some reason, the original string "\ud34e" is not equal to the deserialized string.
This is the serialization code (where encoding = Charset.forName( "UTF-16BE" ) and str = "\ud34e")
ByteArrayOutputStream out = new ByteArrayOutputStream();
Writer temp = new OutputStreamWriter( out, encoding );
temp.write( str );
temp.close();
byte[] bytes = out.toByteArray();
String deserialized = new String( bytes, encoding );
So what am I doing wrong?
Thanks!
DE4E is 1/2 of a surrogate pair. By itself, it’s invalid. It will be converted to ? or discarded by the OutputStreamWriter. If you use use the java.nio classes you can see the errors.