I know the ? operator enables “non greedy” mode, but I am running into a problem, I can’t seem to get around. Consider a string like this:
my $str = '<a>sdkhfdfojABCasjklhd</a><a>klashsdjDEFasl;jjf</a><a>askldhsfGHIasfklhss</a>';
where there are opening and closing tags <a> and </a>, there are keys ABC, DEF and GHI but are surrounded by some other random text. I want to replace the <a>klashsdjDEFasl;jjf</a> with <b>TEST</b> for example. However, if I have something like this:
$str =~ s/<a>.*?DEF.*?<\/a>/<b>TEST><\/b>/;
Even with the non greedy operators .*?, this does not do what I want. I know why it does not do it, because the first <a> matches the first occurrence in the string, and matches all the way up to DEF, then matches to the nearest closing </a>. What I want however is a way to match the closest opening <a> and closing </a> to “DEF” though. So currently, I get this as the result:
<a>TEST</b><a>askldhsfGHIasfklhss</a>
Where as I am looking for something to get this result:
<a>sdkhfdfojABCasjklhd</a><b>TEST</b><a>askldhsfGHIasfklhss</a>
By the way, I am not trying to parse HTML here, I know there are modules to do this, I am simply asking how this could be done.
Thanks,
Eric Seifert
The problem is that even with non-greedy matching, Perl is still trying to find the match that starts at the leftmost possible point in the string. Since
.*?can match<a>or</a>, that means it will always find the first<a>on the line.Adding a greedy
(.*)at the beginning causes it to find the last possible matching<a>on the line (because.*first grabs the whole line, and then backtracks until a match is found).One caveat: Because it finds the rightmost match first, you can’t use this technique with the
/gmodifier. Any additional matches would be inside$1, and/gresumes the search where the previous match ended, so it won’t find them. Instead, you’d have to use a loop like: