Can someone help me rewrite this regex to be non-exponential? I’m using perl to

Question

0

Asked: May 23, 20262026-05-23T07:07:12+00:00 2026-05-23T07:07:12+00:00

Can someone help me rewrite this regex to be non-exponential? I’m using perl to

0

Can someone help me rewrite this regex to be non-exponential?

I’m using perl to parse email data. I want to extract email addresses from the data. Here is a shortened version of the regex that I’ve been using:

my $email_address = qr/(?:[^\s@<>,":;\[\]\\]+?|"[^\"]+?")@/i

For simplicity I’ve removed the later domain part of the regex. (It isn’t causing any problems.)

This will find an RFC compliant email address that either contains non-email meta chars OR a “quoted” string followed by @. Using the OR ‘|’ part of the regex with the two different multicharacter patterns creates an exponential problem.

The problem is, when I unleash this on a line of data that is several thousands of characters long.

$ wc line7.txt 
1    221 497819 line7.txt

(I’m sorry but I cannot provide input data at this time, I may be able to mock some up later.)

Much like rewriting (a*b*)* to (a|b)*, I need to rewrite this regex.

Splitting it into two separate regex’s creates more work in code changes then I am willing to perform at this point. Although it would solve my problem.

The eventual target machine is on a Hadoop cluster. So I would like to avoid CPAN modules that don’t come with Hadoop’s version of perl. (I’ll have to check if Email::Find can even be used.) This is a problem I encountered at work.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T07:07:12+00:00

qr/(?:(?>[^\s@<>,":;\[\]\(\)\\])+|"[^\"]{0,62}")@/i

The (?>expression) part prevents backtracking. It should be safe because there can be no overlap between the non-quoted part and the quoted part.

I removed the lazy repeats +? because the parts of the alternation already look for the @ and " respectively. Phrases could be a large source of backtracking, so I looked at the Wikipedia article which states that the local part (before the @) can be only 64 characters long (subtracting two quotes yields {0,62} (if ""@ is not valid, then change it to {1,62}…. I do not intend for this to be a completely functional email parser. That is your job. I simply provide help for the catastrophic backtracking.) Best of luck!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Can someone help me rewrite this regex to be non-exponential? I’m using perl to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply