I need help writing a regex with grouping to match the following six input strings:
- N.A
- N.A.S
- N.C.A
- N.C.A.S
- N.CX.CY.A
- N.CX.CY.A.S
The pattern in English is:
(letters) followed by (nothing or letters or letters dot letters) followed by (letters) folowed by (nothing or dot letters)
The result of the regex should be four groups. Given the above examples:
Group 1 is "N" in all cases
Group 2 is empty in case 1-2, "C" for case 3-4, and "CX.CY" for case 5-6
Group 3 is "A" in all cases
Group 4 is "S" in cases 2,4,6 and emtpy in cases 1,3,5
I have played regex whack-a-mole on this and I can get it about half right but when I try to update it for the other cases I end up breaking the ones that used to work.
A solution would be awesome, but hints or tips are equally appreciated.
Update 2012 March 12
As has been pointed out, inputs 2 & 3 look the same (per the English description) and 4 & 5 look the same.
Clarification:
In the real-world input, placeholder ‘S’ has a known value, which is the literal string ‘Value’. In trying to generalize the problem I made it impossible to solve. The updated English description is:
(letters) followed by (nothing or letters or letters dot letters) followed by (letters) folowed by (nothing or “.Value”)
I am attempting to adapt the suggestions below. I had not considered the use of ‘lookaround’, so thanks to everyone who suggested that technique in their examples.
Translating your “pattern in English” to regex syntax, this is the best I got:
Explanation:
It works for cases 1,3,5,6, but for 2,4 the capturing groups are wrong (tested in rubular):
Making the first non-capturing group non-greedy fixes the case 2,4 but breaks cases 3,5:
I tried a few combinations of greedy/non-greedy groups, but got nothing. IMHO you need to improve your specs for this to be even possible to solve at all…