Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3342792
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T00:52:49+00:00 2026-05-18T00:52:49+00:00

I have an XMLEventReader. It has been built from an XMLInputFactory with the UTF8

  • 0

I have an XMLEventReader. It has been built from an XMLInputFactory with the “UTF8” encoding. I am using it to read an XML file whose “encoding” attribute is set to “UTF-8”.

I have verified that the XML file views correctly under Firefox. When you view the page encoding, it says that it is UTF-8.

I have set the XMLEventReader to coalesce character events like this:

reader.setProperty(XMLEventReader.IS_COALESCING, Boolean.TRUE);

The XML document does not have a DTD. It is valid.

The XMLEventReader will occasionally report that a CHARACTERS event has been received whose content is (minus the quotation marks), for example:

r problems were most severe and frequent.) Did you sleep a lot more than usual nearly every night during that period?</text>  Ð 

Note the presence of the markup tag near the end of the sample, as well as the capital thorn. Note also that the sentence has been lopped off; presumably there was another CHARACTERS event before this one that contains the leading part of the sentence.

Why does the XMLEventReader screw up the parsing? Why are the characters not displaying correctly? Why does the XMLEventReader not coalesce CHARACTERS events, if that’s what’s going on? Why is StAX so unbelievably festeringly ugly and unpredictable?

I am using the XMLEventReader supplied to me by my Java runtime (Java 6) on a Mac.

Here is some sample XML, which of course I’ve simply copied from my editor, so who knows what character conversions occurred as a result of that, but anyhow:

<question id="BMHPD17">
  <permittedResponseCount>1</permittedResponseCount>
  <text>It’s hard for me to stay out of trouble. (Would you say this is true or false for you?)</text>
  <namedAnswerSet idref="TrueFalse"></namedAnswerSet>
</question>

Note the “smart apostrophe” on line 3.

I am reading this by reacting to a CHARACTERS event, saving its contents to a String on the stack, then reacting to an END_ELEMENT event whose name is “question”. Upon receiving the END_ELEMENT event for question, I retrieve the value of the String I just mentioned, and construct a Java object that takes the string I just mentioned as input.

When I System.out.println() the result, I get (sometimes) the bogus junk I referred to earlier.

When I wrap System.out inside a PrintWriter with “UTF8” encoding set, so that I’m not simply outputting characters according to the platform’s encoding, I get the same results.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T00:52:50+00:00Added an answer on May 18, 2026 at 12:52 am

    This turns out to be a bug on Mac OSX’s JVM. The character encoding used by the console does not default to UTF-8, even though all other usages of the default character encoding are UTF8.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Have just started using Visual Studio Professional's built-in unit testing features, which as I
I've been attempting to write an XML parser to read through a Wikipedia XML
Have just started using Google Chrome , and noticed in parts of our site,
Have you refactored from an ActiveRecord to a DataMapper pattern? What conditions prompted the
I have an understanding problem of how the following code works: XMLInputFactory xif =
I have program that needs to parse XML that contains character entities. The program
Have been writing the shell script such as : #! /bin/bash `sqlplus -s <username>/<passwd>@dbname`
Have a slight problem. Trying to post XML to a server. To do this,
Have been searching all over the internet but struggling to find my answer to
Have an app that can use tts to read text messages. It can also

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.