This question relates to character class subtraction in regular expression (regex). I refer to the regex flavour of XPATH 2.0 second edition.
When there are negative groups within a character class subtraction, does the subtract operator (-) occur before? or after the negative group operator (^)?
The text of the XPATH/ XML schema specification is below. But to my mind, it reads ambiguously.
For any ·positive character group· or ·negative character group· G,
and any ·character class expression· C, G-C is a valid ·character
class subtraction·, identifying the set of all characters in C(G) that
are not also in C(C).
To be more specific, consider the following three regexes:
- [^abc-[ad]]
- [^abc-[^ad]]
- [abc-[^ad]]
being matched against the haystack text of:
- abcdef
What are the possible match texts (first and subsequent)?
I don’t think that text is ambiguous, if we are lenient enough to read
G-Cas[G-[C]], and a negative group,^G, as[^G]. Now, it looks clear that the caret is part of the first group, and does not negate both groups.Therefore,
[^abc-[ad]]would match:Keep in mind, you can easily test to see the behavior
:).As a bonus, .Net regular expressions also support this feature, making it a little easier to test online.
See also: Character Class Subtraction