What is the preferred way to extract elements from a HTML page in Java?
My HTML is has many of the following rows:
<tr class="item-odd">
<td class="data"><a href="http://.....">TITLE</a></td>
<td><div class="cost">$1.99</div></td>
</tr>
The class alternates item-odd and item-even.
I need to extract:
- Url
- Title
- price
Is regular expressions the way to go?
I’d use a library like HTML Parser for this job. Have a look at the samples and/or the javadoc. Also have a look at previous questions here on SO.
HTML Parser is pretty easy to use and should do the job. For alternatives, have a look at this previous answer.