I have an application that fires several processes. Each process loads an HTML file

Question

0

Asked: May 27, 20262026-05-27T14:55:03+00:00 2026-05-27T14:55:03+00:00

I have an application that fires several processes. Each process loads an HTML file

0

I have an application that fires several processes. Each process loads an HTML file and tries to find whether a pattern appears in it, something like this:

OUTER:
while(my ($prov,$arr_ref) = each(%{$self->{TAGS}})) {
    foreach my $tag (@{$arr_ref}) {
        if ($html =~ m/\Q$tag\E/i) {
            $provider = $prov;
            last OUTER;
        }
    }
}

$self->{TAGS} key is a pattern name, and the value is a reference to array with strings (scalars).

I was profiling the program, and found that this part:

$html =~ m/\Q$tag\E/i

makes my CPU jump to 100%. If I remove it, it barely gets to 10%.

I have only one approach in mind, which is turning all the scalars (strings) inside each array ref to compiled regex (qr/.../). I guess it won’t improve it so much, since I guess the issue in fact when the regex actually searches all the HTML pages, which can be hundreds of bytes in size.

What can I do to improve this section?

SUB-QUESTION: due to the answers below,and some testing I made, I will sharpen my question, the issue is NOT the regex, I already tried the index way before I asked this question, also tried compiled regex with qr//, this issue is, with the size of the html files, the $html contents are HTML text, sometimes its small, and sometimes its big, so the issue here is: WHAT IS THE BEST WAY (Resource wise…) TO FIND IF A STRING APPEARS INSIDE A LARGER (LETS SAY 1MB IN SIZE) STRING?

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T14:55:04+00:00

Using index should increase performance since you’ll get rid of all the overhead of using regular-expressions. Please, do a benchmark!

$html_searchable = lc ($html);

...    

while ( ... ) {
  foreach ( ... ) {
    if (index ($html_searchable, lc ($tag)) > -1) {
      ... # we got a match
    }
  }
}

If you’d like to increase it even more you should store all your $tags as lowercase strings so that you don’t have to lc the same string multiple times.

Documentation

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have an application that fires several processes. Each process loads an HTML file

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply