Any good tutorial on parsing online HTML pages using msxml/IXMLDOMDocument? I need to parse

Question

0

Editorial Team

Asked: June 9, 20262026-06-09T22:13:44+00:00 2026-06-09T22:13:44+00:00

Any good tutorial on parsing online HTML pages using msxml/IXMLDOMDocument? I need to parse

0

Any good tutorial on parsing online HTML pages using msxml/IXMLDOMDocument?

I need to parse HTML pages using XPATH expressions.

Most probably some of HTML pages will not be 100% valid , so I need to configure parser to be more “friendly” or not so strict for such pages.

Any ideas?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T22:13:46+00:00

You can tidy up invalid html using tidy or a tidy wrapper library. After doing this you can parse the html with specifying xhtml namespace using MSXML.
EfTidy is a good, up to date open source tidy wrapper project to tidying up html.
I want to show an example written in VBScript to addressing with XPath to get title of this question.

'EfTidy constants
Const XhtmlOut = 1
Const DoctypeLoose = 3 'for transitional

Dim EfTidy, sInvalidHTML, sValidHTML

With CreateObject("MSXML2.XMLHTTP.6.0")
    .open "GET", "http://stackoverflow.com/q/12027205/"
    .send
    sInvalidHTML = .responseText
End With

Set EfTidy = CreateObject("EfTidy.tidyCom")
With EfTidy.Option 'config
    .Clean = True
    .OutputType = XhtmlOut
    .DoctypeMode = DoctypeLoose
End With
sValidHTML = EfTidy.TidyMemToMem(sInvalidHTML)

With CreateObject("MSXML2.DomDocument.6.0")
    .async = False
    .validateOnParse = False
    .resolveExternals = True
    .setProperty "ProhibitDTD", False
    If .LoadXml(sValidHTML) Then
        .setProperty "SelectionLanguage", "XPath"
        .setProperty "SelectionNamespaces", "xmlns:xhtml='http://www.w3.org/1999/xhtml'"
        WScript.Echo .SelectSingleNode("//xhtml:div[@id='question-header']/xhtml:h1").Text
    End If
End With

Hope it helps.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Any good tutorial on parsing online HTML pages using msxml/IXMLDOMDocument? I need to parse

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply