I need some advice and possible code examples for parsing an HTML table from a website. I’m using the webclient class to download the html from an address. I then need to find the table I want the data from. So for example if the table id is <table id="cia_list", I want to loop through the <td> tags and get just the text inside them. What would be the best way to approach this?
I need some advice and possible code examples for parsing an HTML table from
Share
In the past I have converted the HTML to XML and then used XSLT to parse the results. If this is an approach you want to take I would recommend looking at SGMLReader, which will handle the conversion.
People will often attempt to use regex to do what you are talking about. This is something I typically advise against. Here is an amusing post that goes over some of the reasons not to do this:
RegEx match open tags except XHTML self-contained tags