I’m new to Perl-HTML things. I’m trying to fetch both the texts and links from a HTML table.
Here is the HTML structure:
<td>Td-Text
<br>
<a href="Link-I-Want" title="title-I-Want">A-Text</a>
</td>
I’ve figured out that WWW::Mechanize is the easiest module to fetch things I need from the <a> part, but I’m not sure how to get the text from <td>. I want the two tasks happen back-to-back because I need to pair each cell’s <td>-Text with its corresponding <a>-Text in a hash array.
Any help will be much appreciated!
Z.Zen
WWW::Mechanize is good at extracting links, but if you need to get other text, I usually combine it with HTML::TreeBuilder. Something like this:
The only problem with this code is that you don’t want all of the text in the
<td>tag. How you fix that is up to you. If the$aTextis sufficiently unique, you might do something like:In the worst case, you’d have to write your own function to extract the text elements you want, stopping at the
<br>(or however you determine the stopping point).