This arose from a discussion on formalizing regular expressions syntax. I’ve seen this behavior

Question

0

Asked: June 17, 20262026-06-17T19:25:14+00:00 2026-06-17T19:25:14+00:00

This arose from a discussion on formalizing regular expressions syntax. I’ve seen this behavior

0

This arose from a discussion on formalizing regular expressions syntax. I’ve seen this behavior with several regular expression parsers, hence I tagged it language-agnostic.

Take the following expression (adjust it for your favorite language):

replace("input", "(.*)*", "$1")

it will return an empty string. Why?

More curiously even, the expression replace("input", "(.*)*", "A$1B") will return the string ABAB. Why the double empty match?

Disclaimer: I know about backtracking and greedy matches, but the rules laid out by Jeffrey Friedl seem to dictate that .* matches everything and that no further backtracking or matching is done. Then why is $1 empty?

Note: compare with (.+)*, which returns the input string. However, http://regexhero.com shows that there are still two matches, which seems odd for the same reasons as above.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T19:25:16+00:00

Let’s see what happens:

(.*) matches "input".
"input" is captured into group 1.
The regex engine is now positioned at the end of the string. But since (.*) is repeated, another match attempt is made:
(.*) matches the empty string after "input".
The empty string is captured into group 1, overwriting "input".
$1 now contains the empty string.

A good question from the comments:

Then why does replace("input", "(input)*", "A$1B") return "AinputBAB"?

(input)* matches "input". It is replaced by "AinputB".
(input)* matches the empty string. It is replaced by "AB" ($1 is empty because it didn’t participate in the match).
Result: "AinputBAB"

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This arose from a discussion on formalizing regular expressions syntax. I’ve seen this behavior

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply