A continuation of this post, I am trying to parse out some data from an HTML page. Here is the HTML (there is more info on the page, but this is the important section):
<table class="integrationteamstats">
<tbody>
<tr>
<td class="right">
<span class="mediumtextBlack">Queue:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
<td class="right">
<span class="mediumtextBlack">Aban:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0%</span>
</td>
<td class="right">
<span class="mediumtextBlack">Staffed:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
</tr>
<tr>
<td class="right">
<span class="mediumtextBlack">Wait:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0:00</span>
</td>
<td class="right">
<span class="mediumtextBlack">Total:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
<td class="right">
<span class="mediumtextBlack">On ACD:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
</tr>
</tbody>
</table>
I need to get 2 pieces of information: the data inside of the td below Queue and the data inside the td below Wait (so the Queue count and wait time). Obviously the numbers are going to update frequently.
I have gotten to the point where the HTML is pilled into an HtmlDocument variable. And I’ve found something along the lines of using an HtmlNodeCollection to gather nodes that meet a certain criteria. This is basically where I am stuck:
HtmlNodeCollection tds =
new HtmlNodeCollection(this.html.DocumentNode.ParentNode);
tds = this.html.DocumentNode.SelectNodes("//td");
foreach (HtmlNode td in tds)
{
/* I want to write:
* If the last node's value was 'Queue', give me the value of this node.
* and
* If the last node's value was 'Wait Time', give me the value of this node.
*/
}
And I can go through this with a foreach, but I am not certain how to access the value or how to get the next value.
Generally, there’s no need to go through with a
foreachas getting the targeted information is pretty easy (with aforeachyou’d have to manage the state of each iteration of the loop and it’s really unwieldy).First, you want to get the table. Filtering on the
classattribute is generally a bad idea, as you can have multiple elements in an HTML document that have the class applied to it. If you had anidattribute, that would be ideal.That said, if this is the only table with this class, then you can get the body of the
tableelement using:From there, you want to get the individual rows. Since these are direct children of the
tbodyelement, you can get the rows by position through theChildNodesproperty, like so:Then you want the second
tdelement in each row. While there’s aspantag in there that wraps the content, you want all of the text that’s in thetdelement in it’s entirety, you can use theInnerTextproperty to get the value:Note, there’s replication here, so if you find there are a lot of rows that you have to parse like this, you might want to factor out some of the logic into helper methods.