My free webhost appends analytics javascript to all PHP and HTML files. Which is fine, except that I want to send XML to my Android app, and it’s invalidating my files.
Since XML is parsed in its entirety (and blows up) before passed along to my SAX ContentHandler, I can’t just catch the exception and continue merrily along with a fleshed out object. (Which I tried, and then felt sheepish about.)
Any suggestions on a reasonably efficient strategy?
I’m about to create a class that will take my InputStream, read through it until I find the junk, break, then take what I just wrote to, convert it back into an InputStream and pass it along like nothing happened. But I’m worried that it’ll be grossly inefficient, have bugs I shouldn’t have to deal with (e.g. breaking on binary values such as embedded images) and hopefully unnecessary.
FWIW, this is part of an Android project, so I’m using the android.util.Xml class (see source code). When I traced the exception, it took me to a native appendChars function that is itself being called from a network of private methods anyway, so subclassing anything seems to be unreasonably useless.
Here’s the salient bit from my stacktrace:
E/AndroidRuntime( 678): Caused by: org.apache.harmony.xml.ExpatParser$ParseException: At line 3, column 0: junk after document element
E/AndroidRuntime( 678): at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:523)
E/AndroidRuntime( 678): at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:482)
E/AndroidRuntime( 678): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:320)
E/AndroidRuntime( 678): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:277)
I guess in the end I’m asking for opinions on whether the InputStream -> manually parse to OutputStream -> recreate InputStream -> pass along solution is as horrible as I think it is.
you could use a FilterStream for that no need for a buffer
best thing to do is add a delimiter to the end of the XML like
--theXML ends HERE --or a char not found in XML like a group of 16\u04chars (you then only need to check every 16th byte) to the end of the XML and read until you find itimplementation assuming
\u04delimnote this misses throws, some error checking and needs proper debugging