Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6578449
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T15:46:20+00:00 2026-05-25T15:46:20+00:00

I have the following script, which grabs a webpage, then does a regex to

  • 0

I have the following script, which grabs a webpage, then does a regex to find items I’m looking for:

use warnings;
use strict;
use LWP::Simple;

my $content=get('http://mytempscripts.com/2011/09/temporary-post.html') or die $!;
$content=~s/\n//g;
$content=~s/ / /g;
$content=~/<b>this is a temp post<\/b><br \/><br \/>(.*?)<div style='clear: both;'><\/div>/;
my $temp=$1;


while($temp=~/((.*?)([0-9]{1,})(.*?)\s+(.*?)([0-9]{1,})(.*?)\s+(.*?)([0-9]    {1,})(.*?)\s+)/g){
print "found a match\n";
}

This works, but takes a long, long time. When I shorten the regex to the following, I get the results in less than a second. Why does my original regex take so long? How do I correct it?

while($temp=~/((.*?)([0-9]{1,})(.*?)\s+(.*?)([0-9]{1,})(.*?)\s+(.*?)([0-9]    {1,})(.*?)\s+)/g){
print "found a match\n";
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T15:46:21+00:00Added an answer on May 25, 2026 at 3:46 pm

    Regular expressions are like the sort function in Perl. You think it’s pretty simple because it’s just a single command, but in the end, it uses a lot of processing power to do the job.

    There are certain things you can do to help out:

    1. Keep your syntax simple as possible.
    2. Precompile your regular expression pattern by using qr// if you’re using that regular expression in a loop. That’ll prevent Perl from having to compile your regular expression with each loop.
    3. Try to avoid regular expression syntax that has to do backtracking. This usually ends up being the most general matching patterns (such as .*).

    The wretched truth is that after decades of writing in Perl, I’ve never masted the deep dark secrets of regular expression parsing. I’ve tried many times to understand it, but that usually means doing research on the Web, and …well… I get distracted by all of the other stuff on the Web.

    And, it’s not that difficult, any half decent developer with an IQ of 240, and a penchant for sadism should easily be able to pick it up.


    @David W.: I guess I’m confused on backtracking. I had to read your link several times but still don’t quite understand how to implement it (or, not implement it) in my case. – user522962

    Let’s take a simple example:

    my $string = 'foobarfubar';
    $string =~ /foo.*bar.*(.+)/;
    my $result = $1;
    

    What will $result be? It will be r. You see how that works? Let’s see what happens.

    Originally, the regular expression is broken into tokens, and the first token foo.* is used. That actually matches the whole string:

    "foobarfubar" =~ /foo.*/
    

    However, if the first regular expression token captures the whole string, the rest of the regular expression fails. Therefore, the regular expression matching algorithm has to back track:

    "foobarfubar" =~ /foo.*/    #/bar.*/ doesn't match
    "foobarfuba" =~ /foo.*/     #/bar.*/ doesn't match.
    "foobarfub" =~ /foo.*/      #/bar.*/ doesn't match.
    "foobarfu" =~ /foo.*/       #/bar.*/ doesn't match.
    "foobarf" =~ /foo.*/        #/bar.*/ doesn't match.
    "foobar" =~ /foo.*/         #/bar.*/ doesn't match.
     ...
    "foo" =~ /foo.*/            #Now /bar.*/ can match!
    

    Now, the same happens for the rest of the string:

    "foobarfubar" =~ /foo.*bar.*/  #But the final /.+/ doesn't match
    "foobarfuba"  =~ /foo.*bar.*/  #And the final /.+/ can match the "r"!
    

    Backtracking tends to happen with the .* and .+ expression since they’re so loose. I see you’re using non-greedy matches which can help, but it can still be an issue if you are not careful — especially if you have very long and complex regular expressions.

    I hope this helps explain backtracking.

    The issue you’re running into isn’t that your program doesn’t work, but that it takes a long, long time.

    I was hoping that the general gist of my answer is that regular expression parsing isn’t as simple as Perl makes it out to be. I can see the command sort @foo; in a program, but forget that if @foo contains a million or so entries, it might take a while. In theory, Perl could be using a bubble sort and thus the algorithm is a O2. I hope that Perl is actually using a more efficient algorithm and my actual time will be closer to O * log (O). However, all this is hidden by my simple one line statement.

    I don’t know if backtracking is an issue in your case, but you’re treating an entire webpage output as a single string to match against a regular expression which could result in a very long string. You attempt to match it against another regular expression which you do over and over again. Apparently, that is quite a process intensive step which is hidden by the fact it’s a single Perl statement (much like sort @foo hides its complexity).

    Thinking about this on and off over the weekend, you really should not attempt to parse HTML or XML with regular expressions because it is so sloppy. You end up with something rather inefficient and fragile.

    In cases like this may be better off using something like HTML::Parser or XML::Simple which I’m more familiar with, but doesn’t necessarily work with poorly formatted HTML.

    Perl regular expressions are nice, but they can easily get out of our control.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have an image uploading script in which i use the following setup to
Currently, I have a script which does the following. If I have text file
I have the following script which does not work 100%, it returns about 20
I have the following script which does not work. it should be using jquery
I have the following script which first shows only the first para of the
I have the following script which appears multiple times on my page. I have
I have the following script, which is working for the most part Link to
I have a problem with following script. It generates a list of places which
I have the following script which identifies lines in a file which I want
I have the following script which I need to modify a little. Here is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.