I just don’t see through myself currently. This should be standard: I get an XML with some UTF-8 characters, which i want to parse.
Here is an example:
<person><name>Nguyển Thị Ngân</name></person>
When I parse this with GWTs XMLParser and print out the name node value, then the characters are corrupted:
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><person><name>Nguyển Thị Ngân</name></person>";
Document doc= XMLParser.parse(xml);
NodeList list = doc.getElementsByTagName("name");
for(int i = 0; i < list.getLength(); i++){
System.out.println("XMLParser: " + list.item(i).getFirstChild().getNodeValue());
}
System.out.println("System.out: " + xml);
The output is:
XMLParser: Nguyá»n Thá» Ngân
System.out: <?xml version="1.0" encoding="UTF-8"?><person><name>Nguyển Thị Ngân</name></person>
which I interpret, that it the character mess has nothing to do with the printing via System.out.
What could be the problem here?
I think the issue is as described by Thomas Broyer. It can’t be what chooban says because printing the raw XML works as expected. You could try to replace the unicodes with their XML escape codes: