I am parsing XML returned from a website but sadly it is slightly malformed. I am getting XML like:
<tag attrib="Buy two for £1" />
Which, I am informed, is invalid because £ is an HTML character, not an XML character and definitely cannot appear in an attribute.
What can I do to fix this, assuming I cannot tell the website to obey the rules? I am considering using a FilterInputStream to filter the arriving data before it gets to the SAX parser but this seems over the top.
In the end I failed to do this with the parser. My solution was to write a
FilterInputStreamthat converted all &xxxx; references into their &#nnnn; form.XML.java – For completeness. Please confirm the completeness of the list.