I use python regular expressions (re module) in my code and noticed different behaviour in theese cases:
re.findall(r'\s*(?:[a-z]\))?[^.)]+', 'a) xyz. b) abc.') # non-capturing group
# results in ['a) xyz', ' b) abc']
and
re.findall(r'\s*(?<=[a-z]\))?[^.)]+', 'a) xyz. b) abc.') # lookbehind
# results in ['a', ' xyz', ' b', ' abc']
What I need to get is just ['xyz', 'abc']. Why are the examples behave differently and how t get the desired result?
The reason
aandbare included in the second case is because(?<=[a-z]\))would first finda)and since lookaround’s don’t consume any character you are back at the start of string.Now[^.)]+matchesaNow you are at
).Since you have made(?<=[a-z]\))optional[^.)]+matchesxyzThis same thing is repeated with
b) abcremove
?from the second case and you would get the expected result i.e['xyz', 'abc']