I am trying to do a screen scrape in perl and have it down to a array of table elements.
the string:
<tr>
<td>10:11:00</td>
<td><a href="/page/controller/33">712</a></td>
<td>Start</td>
<td>Finish</td>
<td>200</td>
<td>44</td>
Code:
if($item =~ /<td>(.*)?<\/td>/)
{
print "\t$item\n";
print "\t1: $1\n";
print "\t2: $2\n";
print "\t3: $3\n";
print "\t4: $4\n";
print "\t5: $5\n";
print "\t6: $6\n";
}
output:
1: 10:11:00
2:
3:
4:
5:
6:
I tried multiple thing but could not get the intended results. thoughts?
The code behaves exactly as you told it to. This is what happens:
You matched the regex exactly once. It did match, and populated the
$1variable with the value of the first (and only!) capture buffer. The match returns “true”, and the code in the if-branch is executed.You want to do two things:
/gmodifier. This matches globally, and tries to return every match in the string, not just the first one.This would lead to the following code:
Do also note that parsing HTML with regexes is evil, and you should search CPAN for a module you like that does that for you.