I’m having trouble with a regular expression in PHP that uses a potentially empty backreference. I was hoping that it would work as explained in http://www.regular-expressions.info/brackets.html:
If a backreference was not used in a
particular match attempt (such as in
the first example where the question
mark made the first backreference
optional), it is simply empty. Using
an empty backreference in the regex is
perfectly fine. It will simply be
replaced with nothingness.
However it seems PHP is a bit different… from http://php.net/manual/en/regexp.reference.back-references.php:
If a subpattern has not actually been
used in a particular match, then any
back references to it always fail.
As a simplified example, I want to match the following two things with this regex:
- {something} … {/something}
- {something:else} … {/something:else}
Where “something” is known ahead of time, and “else” can be anything (or nothing).
so I tried the following regex (“else” hardcoded for simplicity):
preg_match("/\{(something(:else)?)\}(.*?)\{\/something\\2\}/is", $data, $matches)
Unfortunately if (:else)? doesn’t match, the \2 backreference fails. If I make \2 optional (\2?), then I might match {something} … {something:else}, which is no good.
Have I run into a limitation of regular expressions (the infamous “you need a parser, not a regex”) or is this fixable?
Test program:
<?php
$data = "{something} ... {/something}
{something:else} ... {/something:else}
{something:else} ... {/something}";
// won't match {something} ... {/something}
preg_match_all("/\{(something(:else)?)\}(.*?)\{\/something\\2\}/is", $data, $matches);
print_r($matches);
// change \\2 to \\2? and it matches too much
preg_match_all("/\{(something(:else)?)\}(.*?)\{\/something\\2?\}/is", $data, $matches);
print_r($matches);
?>
why don’t you simply use \1 instead of \2?
as to “you need a parser” problem, you will / do need it to parse nested constructs.