I have the following HTML code grabbed from a tv listings page:
<div class="channel_row">
<span class="channel">
<div class="logo"><img src ="/images/channel_logos/WGNAMER.png" /></div>
<p><strong>2</strong><br />
WGNAMER
</p>
</span>
<span class="time" style="width:0.0px;padding:0;height:42px;">
<div style="margin:10px">
<a class="thickbox" style="" href="/tv/info/?program_id=49909&height=260&width=612" title="WGN News at Nine">WGN News at Nine</a>
<p class="schedule_flags"><strong class="new_flag">New</strong>, <strong class="cc_flag">CC</strong>, <strong class="stereo_flag">Stereo</strong></p>
</div>
</span>
<span class="time" style="width:245.6px;padding:0;height:42px;">
<div style="margin:10px">
<a class="thickbox" style="" href="/tv/info/?program_id=49910&height=260&width=612" title="America's Funniest Home Videos">America's Funniest Home Videos</a>
<p class="schedule_flags"><strong class="cc_flag">CC</strong>, <strong class="stereo_flag">Stereo</strong></p>
</div>
</span>
</div>
And it just keeps looping with channel_row over and over again…
Now i have set up some VB code with the help of HtmlAgilityPack in hopes to have a quick and easy way to loop through all these classes and grab logo image, TV channel, station name, HREF of more show description & show title
So in the above example the parse would look like:
/images/channel_logos/WGNAMER.png
2
WGNAMER
/tv/info/?program_id=49909&height=260&width=612
WGN News at Nine
/tv/info/?program_id=49910&height=260&width=612
America's Funniest Home Videos
My VB code is:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim htmlString As String = "<div class=""channel_row"">" & _
"<span class=""channel"">" & _
"<div class=""logo""><img src =""/images/channel_logos/WELF.png"" /></div>" & _
"<p><strong>13</strong><br />" & _
"WELF" & _
"</p>" & _
"</span>" & _
"<span class=""time"" style=""width:245.6px;padding:0;height:42px;"">" & _
"<div style=""margin:10px"">" & _
"<a class=""thickbox"" style="""" href=""/tv/info/?program_id=35424&height=260&width=612"" title=""Praise the Lord"">Praise the Lord</a>" & _
"<p class=""schedule_flags""><strong class=""cc_flag"">CC</strong></p>" & _
"</div>" & _
"</span>" & _
"<span class=""time"" style=""width:122.8px;padding:0;height:42px;"">" & _
"<div style=""margin:10px"">" & _
"<a class=""thickbox"" style="""" href=""/tv/info/?program_id=35425&height=260&width=612"" title=""ACLJ This Week"">ACLJ This Week</a> " & _
"<p class=""schedule_flags""><strong class=""cc_flag"">CC</strong></p>" & _
"</div>" & _
"</span>" & _
"<span class=""time"" style=""width:122.8px;padding:0;height:42px;"">" & _
"<div style=""margin:10px"">" & _
"<a class=""thickbox"" style="""" href=""/tv/info/?program_id=35426&height=260&width=612"" title=""Full Flame"">Full Flame</a> " & _
"<p class=""schedule_flags""><strong class=""cc_flag"">CC</strong></p>" & _
"</div>" & _
"</span>" & _
"<span class=""time"" style=""width:0.0px;padding:0;height:42px;"">" & _
"<div style=""margin:10px"">" & _
"<a class=""thickbox"" style="""" href=""/tv/info/?program_id=35427&height=260&width=612"" title=""Secrets: Kim Clement"">Secrets: Kim Clement</a> " & _
"<p class=""schedule_flags""></p>" & _
"</div>" & _
"</span>" & _
"</div>"
Dim doc = New HtmlAgilityPack.HtmlDocument()
Dim htmlDocument As IHTMLDocument2 = New HTMLDocumentClass()
htmlDocument.write(htmlString)
htmlDocument.close()
doc.LoadHtml(String.Format(htmlString))
Dim res = doc.DocumentNode.SelectNodes("//div[@class='channel_row']")
For Each item In res
Dim firstDiv = item.SelectSingleNode(".//div[@class='channel']")
Dim content1 = firstDiv.ChildNodes(0).InnerText.Trim()
Dim content2 = firstDiv.ChildNodes(1).InnerText.Trim()
Dim content4 = item.SelectSingleNode(".//div[@class='myclass2']")
Next
End Sub
Currently the error is on line Dim content1 = firstDiv.ChildNodes(0).InnerText.Trim() which says:
Object reference not set to an instance of an object.
Any help would be great!
UPDATE
With newest code suggestions:
Dim doc = New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(htmlString)
Dim all = new Dictionary(of String, Object)()
For Each channel In doc.DocumentNode.SelectNodes(".//div[@class='channel_row']")
Dim info = new Dictionary(of String, Object)()
With channel
info!Logo = .SelectSingleNode(".//img").Attributes("src").Value
info!Channel = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(0).InnerText
info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(2).InnerText
info!Shows = From tag In .SelectNodes(".//a[@class='thickbox']")
Select New With {.Show = tag.Attributes("title").Value, .Link = tag.Attributes("href").Value}
End With
all.Add(info!Station, info)
Next
all.Dump()
There are 3 errors:
1) On line Select New With {.Show = Tag.Attributes(“title”).Value, .Link = Tag.Attributes(“href”).Value}
The error is: ‘Select Case’ must end with a matching ‘End Select’.
2) On line all.Add(info!Station, info)
The error is: Statements and labels are not valid between ‘Select Case’ and first ‘Case’.
3) On line all.Dump()
The error is: ‘Dump’ is not a member of ‘System.Collections.Generic.Dictionary(Of String, Object)’.
I’m no HtmlAgilityPack expert, but how about: