I’m using the HTMLAgilityPack to parse HTML pages. However at some point I try

Question

0

Asked: May 18, 20262026-05-18T23:16:14+00:00 2026-05-18T23:16:14+00:00

I’m using the HTMLAgilityPack to parse HTML pages. However at some point I try

0

I’m using the HTMLAgilityPack to parse HTML pages. However at some point I try to parse wrong data (in this specific case an image), which ofc fails for obvious reasons.

Private Sub parseHtml(ByVal content As String, ByVal url As String)
    Try
        Dim contentHash As String = hashGenerator.ComputeHash(content, "SHA1")
        Dim doc As HtmlDocument = New HtmlDocument()

        doc.Load(New StringReader(content))

        Dim root As HtmlNode = doc.DocumentNode
        Dim anchorTags As New List(Of String)

        For Each link As HtmlNode In root.SelectNodes("//a")
            cururl = link.OuterHtml
            If link.Attributes("href") Is Nothing Then Continue For
            If Uri.IsWellFormedUriString(link.Attributes("href").Value, UriKind.Absolute) Then
                urlQueue.Enqueue(link.Attributes("href").Value)
            Else
                Dim myUri As New Uri(url)
                urlQueue.Enqueue(myUri.Scheme & "://" & myUri.Host & link.Attributes("href").Value)
            End If
        Next
    Catch ex As Exception
        MsgBox(ex.Message, MsgBoxStyle.Critical, "Error (parseHtml(" & url & "))")
    End Try
End Sub

The error I get is:

A first chance exception of type
‘System.NullReferenceException’
occurred in Webcrawler.exe Object
reference not set to an instance of an
object.

On the content I try to parse:

��Iޥ�+�: 8�0�x�

How to check whether the content is ‘parse-able’ before trying to parse it to prevent the error?

For now it is an image which makes an error popup however I think it might be just anything which isn’t (x)html.

Thanks in advance ow great community 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T23:16:15+00:00

Editorial Team

2026-05-18T23:16:15+00:00Added an answer on May 18, 2026 at 11:16 pm

You need to check the returned content-type header before trying to parse the returned data.

For an HTML page this should be text/html, for XHTML is would be application/xhtml+xml.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using the HTMLAgilityPack to parse HTML pages. However at some point I try

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply