Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4618640
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T02:17:45+00:00 2026-05-22T02:17:45+00:00

I am having a problem with a non-greedy regular expression (regex). I’ve seen that

  • 0

I am having a problem with a non-greedy regular expression (regex). I’ve seen that there are questions regarding non-greedy regex, but they don’t answer my problem.

Problem: I am trying to match the href of the "lol" anchor.

Note: I know this can be done with Perl HTML parsing modules, and my question is not about parsing HTML in Perl. My question is about the regular expression itself and the HTML is just an example.

Test case: I have four tests for .*? and [^"]. The two first produce the expected result. However the third doesn’t and the fourth just does, but I don’t understand why.

  1. Why does the third test fail in both tests for .*? and [^"]? Shouldn’t the non-greedy operator work?
  2. Why does the fourth test work in both tests for .*? and [^"]? I don’t understand why including a .* in front changes the regex (the third and fourth tests are the same except the .* in front).

I probably don’t understand exactly how these regex work. A Perl Cookbook recipe mentions something, but I don’t think it answers my question.

use strict;

my $content=<<EOF;
<a href="/hoh/hoh/hoh/hoh/hoh" class="hoh">hoh</a>
<a href="/foo/foo/foo/foo/foo" class="foo">foo </a>
<a href="/bar/bar/bar/bar/bar" class="bar">bar</a>
<a href="/lol/lol/lol/lol/lol" class="lol">lol</a>
<a href="/koo/koo/koo/koo/koo" class="koo">koo</a>
EOF

print "| $1 | \n\nThat's ok\n" if $content =~ m~href="(.*?)"~s ;

print "\n---------------------------------------------------\n";

print "| $1 | \n\nThat's ok\n" if $content =~ m~href="(.*?)".*>lol~s ;

print "\n---------------------------------------------------\n";

print "| $1 | \n\nWhy does not the 2nd non-greedy '?' work?\n"
  if $content =~ m~href="(.*?)".*?>lol~s ;

print "\n---------------------------------------------------\n";

print "| $1 | \n\nIt now works if I put the '.*' in the front?\n"
  if $content =~ m~.*href="(.*?)".*?>lol~s ;

print "\n###################################################\n";
print "Let's try now with [^]";
print "\n###################################################\n\n";


print "| $1 | \n\nThat's ok\n" if $content =~ m~href="([^"]+?)"~s ;

print "\n---------------------------------------------------\n";

print "| $1 | \n\nThat's ok.\n" if $content =~ m~href="([^"]+?)".*>lol~s ;

print "\n---------------------------------------------------\n";

print "| $1 | \n\nThe 2nd greedy still doesn't work?\n"
  if $content =~ m~href="([^"]+?)".*?>lol~s ;

print "\n---------------------------------------------------\n";

print "| $1 | \n\nNow with the '.*' in front it does.\n"
  if $content =~ m~.*href="([^"]+?)".*?>lol~s ;
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T02:17:46+00:00Added an answer on May 22, 2026 at 2:17 am

    Try printing out $& (the text matched by the entire regex) as well as $1. This may give you a better idea of what’s happening.

    The problem you seem to have is that .*? does not mean “Find the match out of all possible matches that uses the fewest characters here.” It just means “First, try matching 0 characters here, and go on to match the rest of the regex. If that fails, try matching 1 character. If the rest of the regex won’t match, try 2 characters here. etc.”

    Perl will always find the match that starts closest to the beginning of the string. Since most of your patterns start with href=, it will find the first href= in the string and see if there’s any way to expand the repetitions to get a match beginning there. If it can’t get a match, it’ll try starting at the next href=, and so on.

    When you add a greedy .* to the beginning of the regex, matching starts with the .* grabbing as many characters as it can. Perl then backtracks to find a href=. Essentially, this causes it to try the last href= in the string first, and work towards the beginning of the string.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Ive looked at some questions and non answer the problem im having.. I have
I'm having a problem when trying to apply a regular expression to some strings
Im having a problem with removing non-utf8 characters from string, which are not displaying
We are having problem with the server migration. We have one application that are
I am having a problem regarding a Log Searching Speed and Disk Size. It
Anyone having problems with discus in IE? Works fine in FF but reports problem
I'm having an interesting but difficult problem with my JavaScript code. Basically, I'm trying
I'm having a problem with non-displayed HTML elements being copied to the clipboard, and
I'm having a problem with CakePHP. My application is not echoing out any non-English
I am having another problem with my HttpModule that handles exceptions. (cfr. my previous

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.