I’m parsing some file with XmlPullParser in Android
Everything goes fine except for some especial HTML characters on the text like this:
í it should be í
é it should be é
but they are missing on the Strings I extract:
camión it should be camión and I get camin
and the same with other similar characters.
I don’t know exactly where the problem is, if it’s on
xmlpullparser.getText() or on Java String
How can I solve this?
The problem is that plain XML does not have HTML entities.
é is not defined for XML.
You either have to use an HTML parser (as in the above suggestions) or else translate the entities yourself in XmlPullParser.
Your loop would have to be run by nextToken() and not next();
You would have to respond to XmlPullParser.ENTITY_REF
Of course if you can change your input file to encode the characters directly in utf-8 or iso-8859-1 instead of using HTML entities, that would work too.