I have a XML file encoded in UTF-8. When I open it in Java,

Question

0

Asked: May 27, 20262026-05-27T03:04:15+00:00 2026-05-27T03:04:15+00:00

I have a XML file encoded in UTF-8. When I open it in Java,

0

I have a XML file encoded in UTF-8. When I open it in Java, some(in theory valid) characters remain encoded. For example, I try to get the &#66352 character:

String str = new String(line.getBytes("UTF-8")); System.out.println(str.charAt(pos));

where pos is the position where it should be.
I get instead the & character.

When I open it with Notepad++ and make sure it encodes UTF-8, I get the same problem.

To my mind, there should be two ways: getting from the beginning only codes(no characters) or replacing all codes with characters.

What should I do and how?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T03:04:15+00:00

Please don’t construct a String from a byte array without specifying a charset, thats alway a sign of a problem.
if the charAt returns the ampersand character then you are either not using an xml parser to load the file or the character is double encoded like &66352;.
The character 66352 won’t fit into Java’s 16 bit char datatype and so gets encoded as two surrogate characters in a String. You should use the codePointAt method in this case.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a XML file encoded in UTF-8. When I open it in Java,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply