I’ve run into a problem several times when parsing text in a utf8 and xml file. It’s placing a leading “?” on the first piece of data I parse.
Here is the XML:
<dictionary>
<word id="1" level="1" simp="爱" trad="愛">
<pinyin>ai4</pinyin>
<part>verb</part>
<definition>to love</definition>
</word>
Here is the SAX:
@Override
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) throws SAXException {
if (localName.equals("word")) {
word = new Word();
word.setId(atts.getValue("id"));
When it pulls the first id it pulls “?1” instead of just “1”, but it doesn’t do it for any of the data after than point. The exception it throws is:
04-30 21:42:42.240: E/AndroidRuntime(1418): Caused by: java.lang.NumberFormatException: unable to parse ‘?1’ as integer
I don’t see a “?” when I physically open the XML file, so where is it coming from? Why is it only effecting the first thing?
My guess is that you have an encoding problem. Does your input file have the declaration
If you don’t have an encoding then anything can happen.
Have you created these files with encoding set on all of them? Because otherwise some tools may corrupt the encoding. especially cut-and-paste or certain text-editing tools.