A piece of HTML that I’m trying to parse contains some attributes values without quotation marks, for example with width and height attributes:
<img src="/static/logo.png" width=75 height=90 />
In the C# code, the reader reads until the next anchor tag.
while (reader.ReadToFollowing("a"))
This statement reports a XmlException:
'75' is an unexpected token. The expected token is '"' or '''. Line 16, position 37.
Is there some XmlReaderSetting to make the XmlReader more lenient? I do not have control over the generated HTML.
In order to read HTML, you’ll need a reader designed for that purpose. The HtmlAgilityPack can help you here, as can the SgmlReader referred to in this answer to a related question.
HTML is not XML. They are both based on SGML, but follow different rules. XML has much stricter rules than HTML, which include the need to close all tags and for attributes to be surrounded with single or double quotes. Therefore, unless you are parsing XML-compliant XHTML, XmlReader will not work for you.