Is there a convenient way to write a regex that will try to match as much of the regex as possible?
Example:
my $re = qr/a ([a-z]+) (\d+)/;
match_longest($re, "a") => ()
match_longest($re, "a word") => ("word")
match_longest($re, "a word 123") => ("word", "123")
match_longest($re, "a 123") => ()
That is, $re is considered to be a sequence of regular expressions, and match_longest attempts to match as much of this sequence. In a sense, matching never fails – it’s only a question of how much matching succeeded. Once a regex match fails, undef for the parts that didn’t match.
I know I could write a function which takes a sequence of regexes and creates a single regex to do the job of match_longest. Here’s an outline of the idea:
Suppose you have three regexes: $r1, $r2 and $r3. The single regex to perform the job of match_longest would have the following structure:
$r = ($r1 $r2 $r3)? | $r1 ($r2 $r3) | $r1 $r2 $r3?
Unfortunately, this is quadratic in the number of regexes. Is it possible to be more efficient?
You can use the regex
which has each regex contained only once. You may also use non-capturing groups
(?:...)in this example to not interfere with your original regular expressions.