I wrote a small Perl script with regular expressions to get HTML components of a website.
I know its not a good way of doing this kind of job, but I was trying to test out my regex skills.
When run with either one of the two regex patterns in the while loop it runs perfectly and displays the correct output. But when I try to check both patterns in the while loop the second pattern matches every time and the loop runs infinitely.
My script:
#!/usr/bin/perl -w
use strict;
while (<STDIN>) {
while ( (m/<span class=\"itempp\">([^<]+)+?<\/span>/g) ||
(m/<font size=\"-1\">([^<]+)+?<\/font>/g) ) {
print "$1\n";
}
}
I am testing the above script with a sample input:
<a href="http://linkTest">Link title</a>
<span class="itempp">$150</span>
<font size="-1"> (Location)</font>
Desired output:
$150
(Location)
Thank you! Any help would be highly appreciated!
Whenever a global regex fails to match it resets the position where the next global regex will start searching. So when the first of your two patterns fails it forces the second to look from the beginning of the string again.
This behaviour can be disabled by adding the
/cmodifier, which leaves the position unchanged if a regex fails to match.In addition, you can improve your patterns by removing the escape characters (
"doesn’t need escaping and/needn’t be escaped if you choose a different delimiter) and the superfluous+?after the captures.Also
use warningsis much better than-won the command line.Here is a working version of your code.