Am trying to parse this HTML document to get the contents of flight, time, origin, date and output.
<div id="FlightInfo_FlightInfoUpdatePanel">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr class="">
<td class="airline"><img src="/images/airline logos/US.gif" title="US AIRWAYS. " alt="US AIRWAYS. " /></td>
<td class="flight">US5316</td>
<td class="codeshare">NZ46</td>
<td class="origin">Rarotonga</td>
<td class="date">02 Sep</td>
<td class="time">10:30</td>
<td class="est">21:30</td>
<td class="status">CHECK IN CLOSING</td>
</tr>
I am using this code, based on HTML Agility Pack for windows phone 7 to find and output the content of <td class="flight">US5316</td>
void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
var html = e.Result;
var doc = new HtmlDocument();
doc.LoadHtml(html);
var node = doc.DocumentNode.Descendants("div")
.FirstOrDefault(x => x.Id == "FlightInfo_FlightInfoUpdatePanel")
.Element("table")
.Element("tbody")
.Elements("tr")
.Where(tr => tr.GetAttributeValue("td", "").Contains("class"))
.SelectMany(tr => tr.Descendants("flight"))
.ToArray();
this.scrollViewer1.Content = node;
//Added below
listBox1.itemSource = node;
}
I get no results in either the ScrollViewer or the Listbox. I would like to know if the linq parse that I am using is correct for the HTML I supplied?
What do you intend to do with this line?
GetAttributeValue(name, def)looks for an attribute with the keynamein the node, and it returns the value of that attribute in case it founds it. Otherwise, it returns the default valuedef.So what’s actually happening here is that
<tr>doesn’t have any attribute with the keytd, so it’s returning the default value (an empty string), which does not contain the substring “class”, so your<tr>node is being filtered out.Edit:
This will return an array where each entry is an array of 8 strings containing the contents of each td:
Examples: