Any good tutorial on parsing online HTML pages using msxml/IXMLDOMDocument?
I need to parse HTML pages using XPATH expressions.
Most probably some of HTML pages will not be 100% valid , so I need to configure parser to be more “friendly” or not so strict for such pages.
Any ideas?
You can tidy up invalid html using tidy or a tidy wrapper library. After doing this you can parse the html with specifying xhtml namespace using MSXML.
EfTidy is a good, up to date open source tidy wrapper project to tidying up html.
I want to show an example written in VBScript to addressing with XPath to get title of this question.
Hope it helps.