Are regex atomic groups distributive?
I.e. is (?>A?B?) always equivalent to (?>A?)(?>B?)?
If not please provide a counter example.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Atomic groups in general
The atomic group
(?>regex1|regex2|regex3)takes only the first successful match within it. In other words, it doesn’t allow backtracking.Regexes are evaluated left-to-right, so you express the order you intend things to match. The engine starts at the first position, trying to make a successful match, backtracking if necessary. If any path through the expression would lead to a successful match, then it will match at that position.
Atomic groups are not distributive. Consider these patterns evaluated over
ABC:(?>(AB?))(?>(BC))(no match) and(?>(AB?)(BC))(matchesABC).Atomic Groups with all optional components
But, your scenario where both parts are optional may be different.
Considering an atomic group with 2 greedy optional parts A and B (
(A)?and(B)?). At any position, ifAmatches, it can move on to evaluate the optionalB. Otherwise, ifAdoesn’t match, that’s fine, too because it’s optional. Therefore,(A)?matches at any position. The same logic applies for the optionalB. The question remaining is whether there can be any difference in backtracking.In the case of all optional parts (
(?>A?B?)), since each part always matches, there’s no reason to backtrack within the atomic group, so it will always match. Then, since it is in an atomic group, it is prohibited from backtracking.In the case of separate atomic groups (
(?>A?)(?>B?)), each part always matches, and the engine is prohibited from backtracking in either case. This means the results will be the same.To reiterate, the engine can only use the first possible match in
(?>A?)(?>B?), which will always be the same match as the first possible match in(?>A?B?). Thus, if my reasoning is correct,for this special case, the matches will be the same for multiple optional atomic groups as a single atomic group with both optional components.