This question is a little odd, and I have spent a fair while pushing my knowledge of regular expressions to get this to the point it is at. I’m stuck at the last little bit however. The problem is as follows:
I have a string (which denotes a url in a routing system I’m modifying), that may contain a regular expression to match some segment. For example:
$pattern = "/some/path/to/</[a-z]+/>regex_var1/location";
The important bits to note here are:
- The regular expression is delimited within the string with
<//>(this is not especially optional unless its the end of the world for legacy reasons. I would prefer to leave this as is). - The bit of text after the
/>(regex_var1) is a name for the match of this parameter. I need to keep this out of the expression to keep it compatible with the rest of the system, suffice to say it can be ignored in this context. - This url pattern would match
/some/path/to/another/location
What I want to achieve is to split a given format (example as above) into segments. These segments are used in a backtracking tree traversal to match a Request URI with a controller. At present regular expressions are not supported, my intention is to allow this. In the past each segment was denoted by a /, however I require / characters in the contained regular expression. If I use it in it’s current form the expression is split across two segments. For example
$pattern = "/some/</([a-z]+)(/optional)?/>regex2/location";
$segments = preg_split('/(?<!<)\/(?!>)/', $pattern);
yields 4 segments
// print_r($segments)
Array
(
[0] =>
[1] => some
[2] => </([a-z]+)(
[3] => optional)?/>regex2
[4] => location
)
when I actually only want 3
// print_r($segments)
Array
(
[0] =>
[1] => some
[2] => </([a-z]+)(/optional)?/>regex2
[3] => location
)
I am not interested in matching the whole URL with a regular expression, which would defeat the whole point of the exercise. This problem might seem unwarranted in isolation, but details about why I am after this specific implementation are beyond the scope of the question.
Hm, I cannot see an easy way to do it with a regexp only. You might first parse out the regexes (
/<\/.*?\/>[^\/]*/), store them in an array and replace them by something easy yet non-clashing ($1), then run your regex and reinsert the regexes.