I’m trying to do some screen scraping, and discovered the HTML AgilityPack, but am having some trouble figuring out how to use it with VB.net.
The first thing I want to do is find the URL string for an HREF tag if I know the text that is enclosed in the HREF.
The second thing is that I want to do is parse an HTML table, going through each row, and pulling out the data so I can save it to a database (after some basic analysis).
Here is a good starting link here on SO: How to use HTML Agility pack
See also this: HtmlAgilityPack example for changing links doesn't work. How do I accomplish this?
And this: Finding all the A HREF Urls in an HTML document (even in malformed HTML)
To find a specific HREF, the xpath syntax would be “//a[@href=’your url’]”, meaning: “get any A tag that has an HREF attribute equal to ‘your url’.
EDIT:
To find an HREF if you only know the text, for example if you have the html text ‘
<a href="homepage.html">Cars</a>‘ and look for homepage.html, then this is how you would do it.