I wrote a small Perl script with regular expressions to get HTML components of

Question

0

Editorial Team

Asked: June 8, 20262026-06-08T17:06:40+00:00 2026-06-08T17:06:40+00:00

I wrote a small Perl script with regular expressions to get HTML components of

0

I wrote a small Perl script with regular expressions to get HTML components of a website.

I know its not a good way of doing this kind of job, but I was trying to test out my regex skills.

When run with either one of the two regex patterns in the while loop it runs perfectly and displays the correct output. But when I try to check both patterns in the while loop the second pattern matches every time and the loop runs infinitely.

My script:

#!/usr/bin/perl -w
use strict;

while (<STDIN>) {

    while ( (m/<span class=\"itempp\">([^<]+)+?<\/span>/g) ||
            (m/<font size=\"-1\">([^<]+)+?<\/font>/g) ) {
        print "$1\n";
    }
}

I am testing the above script with a sample input:

<a href="http://linkTest">Link title</a>
<span class="itempp">$150</span>
<font size="-1"> (Location)</font>

Desired output:

$150
(Location)

Thank you! Any help would be highly appreciated!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T17:06:42+00:00

Whenever a global regex fails to match it resets the position where the next global regex will start searching. So when the first of your two patterns fails it forces the second to look from the beginning of the string again.

This behaviour can be disabled by adding the /c modifier, which leaves the position unchanged if a regex fails to match.

In addition, you can improve your patterns by removing the escape characters (" doesn’t need escaping and / needn’t be escaped if you choose a different delimiter) and the superfluous +? after the captures.

Also use warnings is much better than -w on the command line.

Here is a working version of your code.

use strict;
use warnings;

while (<STDIN>) {

    while( m|<span class="itempp">([^<]+)</span>|gc
            or m|<font size="-1">([^<]+)</font>|gc ) {
        print "$1\n";
    }
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I wrote a small Perl script with regular expressions to get HTML components of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply