I need to extract information from an unstructured web page in Android. The information I want is embedded in a table that doesn’t have an id.
<table>
<tr><td>Description</td><td></td><td>I want this field next to the description cell</td></tr>
</table>
Should I use
- Pattern Matching?
- Use BufferedReader to extract the information?
Or are there faster way to get that information?
I think in this case it makes no sense to look for a fast way to extract the information as there is virtually no performance difference between the methods already suggested in answers when you compare it to the time it will take to download the HTML.
So assuming that by fastest you mean most convenient, readable and maintainable code, I suggest you use a
DocumentBuilderto parse the relevant HTML and extract data usingXPathExpressions:If you happen to retrieve invalid HTML, I recommend to isolate the relevant portion (e.g. using
substring(indexOf("<table")..) and if necessary correct remaining HTML errors withStringoperations before parsing. If this gets too complex however (i.e. very bad HTML), just go with the hacky pattern matching approach as suggested in other answers.Remarks