I’m pulling the source of a website. I then want to extract a specific part of it. My intention is to do this with LINQ-to-XML.
However, I get errors when I parse the source:
XElement source = XElement.Load(reader);
The problem seems to be references to namespaces I don’t have. I get the error: 'addthis' is an undeclared prefix. Line 130, position 51. due to this line:
<div class="addthis_toolbox addthis_pill_combo" addthis:url="http://www.foo.com/foo">
And if I delete that one, other occur.
Thing is, I only care about one piece of this XML file – I don’t need to be able to parse the whole file. I just want it in an XElement so I can find that one piece of it. Is there a way for me to hack around the parsing error? And I need a generic solution – I want to parse the file regardless of ANY undeclared prefix errors.
Thanks
This XML is not valid.
In order to use a namespace prefix (such as
addthis:), the namespace must be declared, by writingxmlns:addthis="some URI".In general, you shouldn’t parse HTML using an XML parser, since HTML is likely to be invalid XML, for this reason and a number of other reasons (undeclared entities, unescaped JS, unclosed tags).
Instead, use HTML Agility Pack.