I have a “windows1255” encoded String, is there any safe way i can convert it to a “UTF-8”
String and vice versa?
In general is there a safe way(meaning data will not be damaged) to convert between
Encodings in Java?
str.getBytes("UTF-8");
new String(str,"UTF-8");
if the original string is not encoded as “UTF-8” can the data be damaged?
You can can’t have a
Stringobject in Java properly encoded as anything other than UTF-16 – as that’s the sole encoding for those objects defined by the spec. Of course you could do something untoward like put 1252 values in a char[] and create a String from it, but things will go wrong pretty much immediately.What you can have is byte[] encoded in various different ways, and you can convert them to and from String using constructors which take a Charset, and with
getBytesas in your code.So you can do conversions using a String as an intermediate. I don’t know of any way in the JDK to do a direct conversion, but the intermediate is likely not too costly in practice.
About round-trip comversions – it is not generally true that you can convert between encoding without losing data. Only a few encodings can handle the full spectrum of Unicode characters (eg the UTF family, GB18030, etc) – while many legacy character sets encode only a small subset. You can’t safely round trip through those character sets without losing data, unless you are sure the input falls into the representable set.