I was wondering what the best approach is on Android to retrieve information from a HTML page hosted on the internet?
For example I’d like to be able to get the text from the following page at the start of each day:
http://www.met.ie/forecasts/sea-area.asp
I have been downloading and parsing XML files but I have never tried to parse information from a HTML type file before.
Is there a native way to parse the information I want?
Or do I need a third party library?
Or do I need to look into screen scraping?
If you are parsing HTML, regardless of how you do it, you are screen scraping. Techniques run the gambit from regular expressions to 3rd party libraries like jTidy. Only problem is does jTidy work on Android? I don’t know. You’ll have to research it.
I’d suggest using regular expressions, compile them, and cache the Pattern object for performance.
If you can’t get a proper webservice API for the data you want then you always run the risk of the author changing the layout and moving the data on you and breaking your code. That’s why screen scraping is generally frowned upon and only used as a last ditch effort.