Scope
I am trying to parse this page. For those who are not familiar with portuguese, this page contains all the Subjects from a certain Course (university course), grouped by “Semester”.
So, everytime you see something like this “7º Período Ideal”, you can understand like “Subjects
from the 7th semester”.
Problem I am using a XPath expression to get all the Table Rows from the table that contains those table rows.
XPath Used : //table[@cellspacing=2]//tr
C# Statement : htmlMap.DocumentNode.SelectNodes("//table[@cellspacing=2]//tr");
The HtmlNodeCollection received by this C# statement, contains only the table row nodes until the one with this text EAD0648 Gerência de Produtos / Serviços e Mercados, right after the one with 5º Período Ideal.
This XPath “works”, but i get all the tr's(as it is expected), and this is not what i want.
//tr
Why is the XPath not retrieving all the nodes after this node aswell ?
Is there any cap of ammount of nodes retrieved ?
Am i missing something ?
Thanks in advance
I have encountered this in the past, if the tables are not well formed then issues like this occur. I took a very quick look at the HTML for the page and I see what looks like a possible problem, on line 2785 there is a
</tr>then without a opening<tr>line 2796 has another</tr>.I admit that I did not do an in depth validation to check, but just by looking at it I could not match the opening
<tr>. I immediately checked this because as I mentioned I have faced this exact issue with pages with malformed tables.