I would like to parse my HTML page in a as generic way as possible. I don’t want to build a parser every time the page has been changed so I would like to parse it smartly by the value of the tags.
I know that the HTML Agility Pack provides tools to read and search by the type of tag(td,strong,li etc), but I would like to iterate all the tags and find information which I know by the content of the tag and not by the type of the tag because the type can change.
Example:
The page:
<table>
<tr valign="top">
<td valign="top">Sex:<br />
</td><td valign="top">Male<br />
</td></tr>
<tr valign="top">
<td valign="top">Current City:<br />
</td><td valign="top">New York<br /></td>
- I know that the value will be “Sex:” and the next tag will contain
the gender. - I know that the value will be “Current City:” and then the next
tag will be the city.
I know I can iterate by the tags and but if the tags change my parser will no longer work.
Can I iterate by values and not by the type of tags?
You could input all the nodes inside
<table>into aHtmlNodeCollection. Then iterate through that list of nodes:foreach (HtmlNode node in ListofNodes)Within that, you could check the
InnerHtmlof each node to check for your specific strings? I guess the table has the same fields each time. Either that, or add id’s/css class and look for that specific id/css class.