I noticed that it is very slow for a Regex to finish a XML

Question

0

Asked: May 11, 20262026-05-11T13:02:49+00:00 2026-05-11T13:02:49+00:00

I noticed that it is very slow for a Regex to finish a XML

0

I noticed that it is very slow for a Regex to finish a XML file with 3000 lines [1]:

\(<Annotation\(\s*\w\+='[^']\{-}'\s\{-}\)*>\)\@<=\(\(<\/Annotation\)\@!\_.\)\{-}'MATCH\_.\{-}\(<\/Annotation>\)\@=

I always thought that Regexes are efficient. Why does it take so long to finish the Regex?

[1] How can I repeatedly match from A until B in VIM?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T13:02:50+00:00

It depends on the regular expression itself if it is efficient or not. What it makes inefficient is backtracking. And to avoid this, the regular expression has to be as distinct as possible.

Let’s take the regular expression a.*b as an example and apply it to the string abcdef. The algorithm will first match the literal a in a.*b to the a in abcdef. Next the expression .* will be processed. In the normal greedy mode, where multipliers are expanded to the maximum, it will match to the whole rest bcdef in abdef. Than the last literal b in a.*b will be processed. But as the end of the string is already reached and a mulpliplier is in use, the algorithm will try backtracking to match the whole pattern. The match of .* (bcdef) will be decreased by one character (bcde) and the algorithm tries to comply the rest of the pattern. But the b in a.*b doesn’t match the f in abcdef. So .* will be decreased by one more character until it matches the empty string (thus . is repeated zero times) and the b in a.*b matches the b in abcdef.

As you can see, a.*b applied to abdef needs 6 backtracking approaches for .* until the whole regular expression matches. But if we alter the regular expression and make it distinct by using a[^b]*b instead, there is be no backtracking necessary and the regular expression can be matches within the first approach.

And if you now consider using lazy modifiers instead, I’ve to tell you, that this rules apply to every modifier, both the greedy and lazy modifiers. The difference is instead of first expanding the match to the maximum and than doing backtracking by decreasing the match one character at a time (greedy), the lazy modifiers will first be expanded to the minimum match and than be increased one character at a time.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I noticed that it is very slow for a Regex to finish a XML

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply