I need to read an html file and find a certain paragraph tag, with specific text in it. Once I find that tag, I then want the text from all the next tags until I find a table tag
Example:
<asdf>
</asdf>
<p>THE SIGNAL TO GET INFO</p>
<something>some good stuff in here</something>
<p>something else</p>
<ul>
<li>something good in here for sure</li>
<li>this too</li>
</ul>
<table> I DON'T WANT THIS </table>
I can find the first Paragraph tag with HTML::TokeParser like this:
my $description = "";
my $tp = HTML::TokeParser->new(\$content) || die "Can't open: $!";
while (my $token = $tp->get_tag("p")) {
my $paragraph = $tp->get_trimmed_text("/p");
if ($paragraph =~ /On this page/) {
until ((my $stop = $tp->get_token)->[1] eq "table") {
if ( $stop->[0] eq "S" ) {
print $stop->[0],"\n";
}
}
return $description;
}
}
I’ve tried the above code… but something is desperately wrong with it, since it won’t even compile.
Thanks for your help.
You probably want to call $tp->get_token, storing the data until you see
["S", "table"…]You say you couldn’t get this to work. Can you explain why/what you did see? Perhaps provide a full example for people to play with.
Well, you didn’t provide example output, so I made some assumptions.
Produces: