I have spent way too much time trying to sort this little issue out. I have narrowed down the issue to the exact procedure that throws the error. Yes, I have used Google.. Just throwing that out there before some wise guy replies “search google”!
Anyways, here:
Try
Dim tempSource as String = Nothing
Console.WriteLine("Loading document...")
Dim FILE_NAME As String = "C:\pathto\file.txt"
If System.IO.File.Exists(FILE_NAME) = True Then
Dim objReader As New System.IO.StreamReader(FILE_NAME)
tempSource = objReader.ReadToEnd
objReader.Close()
Console.WriteLine("Loaded the document!")
Else
Console.WriteLine("Error loading document!")
MsgBox("Error loading document!")
Exit Sub
End If
Dim doc As HtmlDocument = Nothing
If tempSource IsNot Nothing Then
Console.WriteLine("Temp Source was not nothing, so loading HAP doc")
doc.Load(tempSource) '<--- This is where the error is!!!!!
Console.WriteLine("HAP doc loaded!")
Else
Console.WriteLine("Error: Unable to load source file into parser!")
MsgBox("Error: Unable to load source file into parser!")
Exit Sub
End If
Console.WriteLine("Document loaded!")
Console.WriteLine("Processing...")
For Each node As HtmlNode In doc.DocumentNode.Elements("//site")
'my code to process each element here
'not important because my app doesnt get this far lol
Next
Catch ex as Exception
Console.Writeline("Caught Exception: {0}", ex.Message)
End Try
I am loading a text file, that contains about 1100 lines, and each line is going to be processed with HTML Agility Pack. From what I can tell, when it runs “doc.loadhtml(richtextbox1)”, it throws the error. I also have tried to load the file into a string, and load the string with “doc.loadhtml(thestring)”. It doesn’t make a difference, still errors.
Here are is a sample of how each line looks:
<Site Index="" Name="" Group="" PR="" />
<Site Index="" Name="" Group="" PR="" />
<Site Index="" Name="" Group="" PR="" />
<Site Index="" Name="" Group="" PR="" />
<Site Index="" Name="" Group="" PR="" />
<Site Index="" Name="" Group="" PR="" />
I am using HTML Agility Pack, however the above is what is on every line, about 1100 lines! For testing, I have a smaller text file made of about 50 lines before I load up the 1100 line file 😉 There aren’t any HTML, HEAD, or BODY tags! They aren’t needed for my parsing. I am using HTML Agility Pack because it is easy to parse elements with. I can grab each value easily from each line.
I am not sure if maybe the error is because it technically isn’t HTML? Meaning since the loaded code doesn’t have an HTML or BODY tag, that it errors? I wanted to get this question posted, and while I am waiting on some answers, I am going to parse the document another way. Just curious as to what the deal is and why HTML Agility Pack isn’t working. More of a proof of concept then anything, for my own learning and knowledge.
Here is the error I get (btw the on the doc.load() line, is where it throws the exception):
Object reference not set to an instance of an object
Last Note: The routine is on a background thread. I have used multi-threading before, and have delegates created for deeper in the code. Maybe I am just overlooking something, I did write most of the code last night at like 3 am lol.
It looks like you’re not initializing the document…
I don’t write VB, mostly C#, Java and C++, but
doc = Nothingseems likedoc = nulland when you invoke theload(...)method on a null object, I would expect to see the “Object reference not set to an instance of an object” exception.Try initializing the doc to an actual
HtmlDocument: