I’m sorry for the poor title, but it is a very generic question
I have to match this pattern
;AAAAAAA(BBBBBB,CCCCC,DDDDDD)
- AAAAA = all characters starting from “;” to “(” (both ;( not included)
- BBBBB = all characters starting from “(” to “,” (both (, not included)
- CCCCC = all characters starting from “,” to “,” (both ,, not included)
- DDDDD = all characters starting from “,” to “)” (both ,) not included)
The “all characters between x and y” is a problem that kills me everytime
🙁
I’m using PHP and I have to match all occurrences of this pattern (preg_match_all) that also, sadly, can be on multiple lines
Thank you in advance!
I would recommend you do not use an ungreedy quantifier, but instead make all repetitions mutually exclusive with their delimiters. What does this mean? It means, for instance, that
Acan be any character except(. Giving this regex:Where the last
[)]is not even necessary.The PHP code would then look like this:
As the comments show, my escaping technique is a matter of taste. This regex is of course equal to:
But I think that looks a lot more mismatched/unbalanced than the other variant. Take you pick!
Finally, for the question why this approach would be better than using ungreedy (lazy) quantifiers. Here is some good, general reading. Basically, when you use ungreedy quantifiers, the engine still has to backtrack. It tries one repetition first, then notices that
(after that doesn’t match. So it has to go back into the repetition and consume another character. But then the(still doesn’t match, so back to the repetition again. With this approach however, the engine will consume as much as possible, when going into the repetition for the first time. And when all non-(characters are consumed, then the engine will be able to match the following(right away.