I have a XML file which contains a part like below. img and br are not tags but when parsing, SAX considers img and br are tags, so because they don’t have close tag, SAX raised error. How do i overcome this, how to ignore img and br when parsing.
Thanks you!
<summary xml:base="http://www.dailymail.co.uk/health/index.html?ITO=1490" xml:lang="en-GB" type="html">
<img src="http://i.dailymail.co.uk/i/pix/2011/10/30/article-2055372-01A8032A0000044D-515_87x84.jpg" width="87" height="84"><br>Millions take statins to combat heart disease by lowering cholesterol, but research suggests that high cholesterol could be a key factor in the development of breast cancer.
</summary>
That is not well-formed XML. In XML, every element must be closed, either with a closing tag (
<br>...</br>) or implicity as an empty tag (<br/>). If some markup characters are required as text, then either they should be embedded in a CDATA section…… or by using character entity references:
SAX has no way of knowing that some markup should be considered XML and other not just because they’re HTML elements. If it sees
<br>, it’s gonna assume that starts abrelement and a corresponding closing tag is going to be encountered later.