I need to take a web page and extract the address information from the page. Some are easier than others. I’m looking for a firefox plugin, windows app, or VB.NET code that will help me get this done.
Ideally I would like to have a web page on our admin (ASP.NET/VB.NET) where you enter a URL and it scraps the page and returns a Dataset that I can put in a Grid.
If you know the format of the page (for instance, if they’re all like that ashnha.com page) then it’s fairly easy to write VB.NET code that does this:
The tough bit is writing the regex, which is a bit of a black art. See regexlib.com for loads of tools, books etc about regexes.
If the HTML format isn’t well-defined enough for a regex, then you’re probably going to have to rely on some amount of user intervention in order to identify which bits are the addresses…