I am trying to parse this HTML page here with HTML Agility Pack, but

Question

0

Editorial Team

Asked: May 30, 20262026-05-30T17:32:27+00:00 2026-05-30T17:32:27+00:00

I am trying to parse this HTML page here with HTML Agility Pack, but

0

I am trying to parse this HTML page here with HTML Agility Pack, but I cannot seem to get it to work as expected.

This is my page (shortened):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="de-ch" xml:lang="de-ch">
<head>
</head>
<body id="Adressservices">
    <div id="page">
        <div id="page-544">
            <table class="full">
                <thead>
                    <tr>
                        <th class="first" scope="col" style="width: 18%;">Type</th>
                        <th class="col" style="width: 20%;">Name</th>
                        <th class="col">Date</th>
                        <th class="col" style="text-align: right; width: 10%;">Size</th>
                    </tr>
                </thead>
                <tbody>
                    <tr>
                        <td class="first">Change</td>
                        <td><a href="/download?file=5210044">somefile01.zip</a></td>
                        <td style="width: 5%;"><b class="filesize">2012-03-01</b></td>
                        <td style="text-align: right;"><b class="filesize">881.00</b></td>
                    </tr>
                    <tr>
                        <td class="first">Change</td>
                        <td><a href="/download?file=7610042">somefile02.zip</a></td>
                        <td style="width: 5%;"><b class="filesize">2012-02-01</b></td>
                        <td style="text-align: right;"><b class="filesize">1400.00</b></td>
                    </tr>
                    <tr>.....</tr>
                </tbody>
            </table>
        </div>
    </div>
</body>
</html>

The real page has quite a few more <tr>....</tr> rows in that table.

I was able to download the page just fine with HTML Agility Pack using this code snippet:

HtmlWeb web = new HtmlWeb();
HtmlDocument archiveDoc = web.Load(_archiveUrl);
var tables = archiveDoc.DocumentNode.SelectNodes("//table");

So I get a handle on my <table> element, works just fine.

Now I was trying to get the first <tr> element from within that table, and I tried this:

HtmlNode node = tables[0];
var allTRNodes = node.SelectNodes("tbody/tr");
var firstTR = allTRNodes[0];

Here, I’m not getting the n <tr> nodes as expected – but just two. And the first of those doesn’t contain a list of y child nodes of type <td> either …

Then I tried Linq-to-“HTML”:

HtmlNode node = tables[0];
var firstTR = node.Element("tbody").Element("tr");

but again: I’m not getting the first <tr> node containing a list of y child nodes of type <td> either …

Trying to get the list of all <td> nodes inside the first <tr> also didn’t work quite as expected:

HtmlNode node = tables[0];
var allTDNodes = node.SelectNodes("tbody/tr/td");
var firstTD = allTDNodes[0];

instead of the y <td> nodes expected, I’m getting just three child nodes – two of the #text, the last one of type <td> – why??

Seems like HTML Agility Pack is misinterpreting the list of <td> nodes as nested nodes……

Any ideas? Thoughts? Hints how to solve this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T17:32:29+00:00

use descendant as in this example:

var linkNode = doc.DocumentNode.SelectSingle("//div[@id=\"content-wrapper\"]/dl/dd");
var hrefNode = linkNode.SelectSingleNode("descendant::a");

Something I don’t agree with HtmlAgility pack that node.SelectNode* call traversing dom from the top and not from the current node.

Here’s adopted sample for your case

// table 
var tableNode = docNode.SelectSingleNode("//table"); 
// first tr
var trNode = tableNode.SelectSingleNode("descendant::tr"); 

// you can also try, but it's overkill
var trNode1 = tableNode.SelectSingleNode("descendant::tr[0]"); 

// then your td
var tdNode = trNode.SelectSingleNode("descendant::td");

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse this HTML page here with HTML Agility Pack, but

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply