I am trying to become better at regular expressions. I am having a hard time trying to understand what does (?> expression ) means. Where can I find more info on non-backtacking subexpressoins? The description of THIS link says:
Greedy subexpression, also known as a non-backtracking subexpression.
This is matched only once and then does not participate in
backtracking.
this other link: http://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.71).aspx has also a definition of non-backtracking subexpression but I still am having a hard time understanding what it means plus I cannot think of an example where I will use (?>exp)
As always, regular-expressions.info is a good place to start.
Use an atomic group if you want to make sure that whatever has once been matched will stay part of the match.
For example, to match a number of “words” that may or may not be separated by spaces, then followed by a colon, a user tried the regex:
When there was a match, everything was fine. But when there wasn’t, his PC would become non-responsive with 100% CPU load because of catastrophic backtracking because the regex engine would vainly try to find a matching combination of words that would allow the following colon to match. Which was of course impossible.
By using an atomic group, this could have been prevented:
Now whatever has been matched stays matched – no backtracking and therefore fast failing times.
Another good example I recently came across:
If you want to match all numbers that are not followed by an ASCII letter, you might want to use the regex
\d+(?![A-Za-z]). However, this will fail with inputs like123abecause the regex engine will happily return the match12by backtracking until the following character is no longer a letter. If you use(?>\d+)(?![A-Za-z]), this won’t happen. (Of course,\d+(?![\dA-Za-z])would also work)