I’m looking to get an explanation on why my SAX parser fails when some

Question

0

Editorial Team

Asked: June 7, 20262026-06-07T09:03:08+00:00 2026-06-07T09:03:08+00:00

I’m looking to get an explanation on why my SAX parser fails when some

0

I’m looking to get an explanation on why my SAX parser fails when some special UTF-8 characters are inside my XML file.

To parse the XML file I use Document doc = builder.parse(inputSource);

However when I use an inputSource it works fine:

DocumentBuilder builder = factory.newDocumentBuilder();
InputStream in = new FileInputStream(file);
InputSource inputSource = new InputSource(new InputStreamReader(in));
Document doc = builder.parse(inputSource);

I don’t quite understand why the latter works. I’ve seen example of it being used but there isn’t an explanation on why it works.
Does the second parse a string rather than a file, therefore the encoding will be UTF-8?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T09:03:09+00:00

I suspect your document isn’t really in the encoding you’ve declared. This line:

InputSource inputSource = new InputSource(new InputStreamReader(in));

will use the platform default encoding to convert the binary data into text within InputStreamReader. The XML parser doesn’t get to do it any more – it doesn’t get to see the raw bytes.

If this is working, your XML file is probably subtly bust – it may be declaring that it’s in UTF-8, but using the platform default encoding (e.g. Windows-1252). Rather than use the workaround, you should fix the XML if you have any choice about it.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m looking to get an explanation on why my SAX parser fails when some

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply