How can I remove capturing from arbitrarily nested sub-groups in a a Perl regex string? I’d like to nest any regex into an enveloping expression that captures the sub-regex as a whole entity as well as statically known subsequent groups. Do I need to transform the regex string manually into using all non-capturing (?:) groups (and hope I don’t mess up), or is there a Perl regex or library mechanism that provides this?
# How do I 'flatten' $regex to protect $2 and $3?
# Searching 'ABCfooDE' for 'foo' OK, but '((B|(C))fo(o)?(?:D|d)?)', etc., breaks.
# I.E., how would I turn it effectively into '(?:(?:B|(?:C))fo(?:o)?(?:D|d)?)'?
sub check {
my($line, $regex) = @_;
if ($line =~ /(^.*)($regex)(.*$)/) {
print "<", $1, "><", $2, "><", $3, ">\n";
}
}
Addendum: I am vaguely aware of $&, $`, and $' and have been advised to avoid them if possible, and I don’t have access to ${^PREMATCH}, ${^MATCH} and ${^POSTMATCH} in my Perl 5.8 environment. The example above can be partitioned into 2/3 chunks using methods like these, and more complex real cases could manually iterate this, but I think I’d like a general solution if possible.
Accepted Answer: What I wish existed and surprisingly (to me at least) does not, is an encapsulating group that makes its contents opaque, such that subsequent positional backreferences see the contents as a single entity and names references are de-scoped. gbacon has a potentially useful workaround for Perl 5.10+, and FM shows a manual iterative mechanism for any version that can accomplish the same effect in specific cases, but j_random_hacker calls it that there is no real language mechanism to encapsulate subexpressions.
In general, you can’t.
Even if you could transform all
(...)s into(?:...)s, this would not work in the general case because the pattern might require backreferences: e.g./(.)X\1/, which matches any character, followed by anX, followed by the originally matched character.So, absent a Perl mechanism for discarding captured results “after the fact”, there is no way to solve your problem for all regexes. The best you can do (or could do if you had Perl 5.10) is to use gbacon’s suggestion and hope to generate a unique name for the capture buffer.