Three underscore separated elements make my strings :
– first (letters and digits)
– middle (letters, digits and underscore)
– last (letters and digits)
The last element is optional.
Note : I need to access my groups by their names, not their indices.
Examples :
String : abc_def
first : abc
middle : def
last : None
String : abc_def_xyz
first : abc
middle: def
last: xyz
String : abc_def_ghi_jkl_xyz
first : abc
middle : def_ghi_jkl
last : xyz
I can’t find the right regex…
I have two ideas so far :
Optional group
(?P<first>[a-z]+)_(?P<middle>\w+)(_(?P<last>[a-z]+))?
But the middle group matches until the end of the string :
String : abc_def_ghi_jkl_xyz
first : abc
middle : def_ghi_jkl_xyz
last : vide
Using the ‘|’
(?P<first>[a-z]+)_(?P<middle>\w+)_(?P<last>[a-z]+)|(?P<first>[a-z]+)_(?P<middle>\w+)
This expression is invalid : first and middle groups are declared two times. I though I could write an expression reusing the matched group from the first part of the expression :
(?P<first>[a-z]+)_(?P<middle>\w+)_(?P<last>[a-z]+)|(?P=first)_(?P=middle)
The expression is valid, however strings with just a first and a middle like abc_def are not matched.
Note
These strings are actually parts of a path I need to match. It could be paths like :
- /my/path/to/abc_def
- /my/path/to/abc_def/
- /my/path/to/abc_def/some/other/stuf
- /my/path/to/abc_def/some/other/stuf/
- /my/path/to/abc_def_ghi_jkl_xyz
- /my/path/to/abc_def_ghi_jkl_xyz/
- /my/path/to/abc_def_ghi_jkl_xyz/some/other/stuf
- /my/path/to/abc_def_ghi_jkl_xyz/some/other/stuf/
- …
Any idea to solve my problem solely with regular expressions ? Post-processing the matched groups is not an option.
Thank you very much !
Change the middle group to be non-greedy, and add beginning and end-of-string anchors:
By default, the
\w+will match as much as possible, which eats the rest of the string. Adding the?tells it to match as little as possible.Thanks to Tim Pietzcker for pointing out the anchor requirements.