What I’m trying to do sounds really easy but somehow I’m struggling with it for several hours now, so please point me in the right direction:
I’ve got some html that looks like this:
<img src="random.jpg" class="someClass" id="someId" alt="test" />
and currently I cannot match this with this code:
my $tp = HTML::TokeParser->new(\$rawHTML) || die "Cant't open: $!";
while (my $token = $tp->get_token){
my $ttype = shift @{ $token };
if($ttype eq "S"){
my($tag, $attr, $attrseq, $rawtxt) = @{ $token };
if ($tag eq "img"){
if(($attr->{'class'} eq "someClass")&&($attr->{'id'}eq "someId")){
my $alttext = $attr->{'alt'};
print "AltText: $alttext";
...
}
}
}
}
}
It seems that TokeParser just ignores self contained tags <…/>.
Why? I’ve searched long and hard for a solution for this and would really appreciate any help to make it work with TokeParser or any other Perl module…
Thanks!
It doesn’t ignore anything:
Output:
BTW, HTML::TokeParser::Simple gives you a better interface.