Using Java I am constructing some XML. In the XML some nodes may have values which are in Korean language or some other language. After constructing, how do I make sure that my whole XML is in UTF-8 encoding? Do I need to explicitly change the string to UTF-8 by using something like:
string = new String(s.getBytes(), "UTF-8");
Or will the whole string be automatically in UTF-8?
Also if I get some XML with some UTF-8 like this <name>[B@19821f<name>, how do I know that [B@19821f is a UTF-8 of some Korean word?
A string contains characters. The encoding is irrelevant until you transform the string into bytes. This happens when you call
String.getBytes(), or when you write the String to a stream (file, socket, whatever).Make sure you use an
OutputStreamWriterto write your XML string, and that you specify UTF-8 as charset when constructing thisOutputStreamWriter. If you’re using a dedicated marshalling API like JAXB, set the appropriate property so that the UTF-8 encoding is used, and the generated XML contains its encoding (in the<?xml ...?>header) . Without knowing which API you’re using to generate your XML string, it’s hard to be more helpful.