I’m using Saxon 9 to analyze invalid html sources. Specifically the html has href values like the following:
<a href=”blah.asp?fn=view&g_varID=1234″>some text</a>
I’m getting errors:
“Error reported by XML parser: The reference to entity “g_varID” must end with
the ‘;’ delimiter.”
The xml parser is reading the “&g_varID” string and complaining that there should be a “;” to delimit the entity. But, of course, this is not intended as an HTML entity — it’s just a piece of a URI.
How can I tell the parser to ignore it? Note: I’m using non-schema-aware Saxon, not Saxon-SA.
Make sure you have a correct xhtml DOCTYPE. According to the xhtml1-strict.dtd that I’m looking at, the href attribute is declared CDATA, not PCDATA, which means literal & is perfectly ok and should not be parsed as an entity.