I’m using SAX parser in my Android application to read a few feeds a time. The script is executed as follows.
// Begin FeedLezer
try {
/** Handling XML **/
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
/** Send URL to parse XML Tags **/
URL sourceUrl = new URL(
BronFeeds[i]);
/** Create handler to handle XML Tags ( extends DefaultHandler ) **/
Feed_XMLHandler myXMLHandler = new Feed_XMLHandler();
xr.setContentHandler(myXMLHandler);
xr.parse(new InputSource(sourceUrl.openStream()));
} catch (Exception e) {
System.out.println("XML Pasing Excpetion = " + e);
}
sitesList = Feed_XMLHandler.sitesList;
String titels = sitesList.getMergedTitles();
And here are Feed_XMLHandler.java and Feed_XMLList.java, which I basically both just took from the web.
However, this code fails at times. I’ll show some examples.
http://imm.io/media/2I/2IAs.jpg
It goes very well here. It even recognizes and displays apostrophes. Even when clicking the articles open, almost all of the text shows, so that’s all good. The source feed is here. I can’t control the feed.
http://imm.io/media/2I/2IB1.jpg Here, it doesn’t go so well. It does display the ï, but it chokes on the apostrophe (there’s supposed to be ‘NORAD’ after the Waarom). Here
http://imm.io/media/2I/2IBQ.jpg This is the worst one. As you can see, the title only displays an apostrophe, whilst it is supposed to be a ‘blablabla’. Also, the text ends in the middle of the line, without any special characters in the quote. The feed is here
In all cases, I have no control over the feed. I think the script does choke on special characters. How can I make sure SAX fetches all the strings correctly?
If anyone knows an answer to this, you really help me out a LOT 😀
Thanks in advance.
This is from the FAQ of Xerces.
You’re code is very well adapted from one of many XML Parsing tutorials (like this one here) Now, the tutorial is good and all, but they fail to mention something very important…
Notice this part here…
I bet at this point you’re checking up booleans to mark which tag you’re under and then setting a value in some kind of
classyou made? or something like that….But the problem is, the SAX parser (which is buffered) will not necesarily get you all the characters between a tag at one go….say if
<tag> Lorem Ipsum...really long sentence...</tag>so your SAX parser callscharactersfunction is chunks….So the trick here, is to keep appending the values to a string variable and the actually
set(or commit) it to your structure when the tag ends…(ie inendElement)Example
Also, it would be better if you use
StringBuilderfor the appending, since that’ll be more efficient….Hope it makes sense! If it didn’t check this and here