In Python 2.6. it seems that markers of the end of string $ and \Z are not compatible with group expressions. Fo example
import re
re.findall("\w+[\s$]", "green pears")
returns
['green ']
(so $ effectively does not work). And using
re.findall("\w+[\s\Z]", "green pears")
results in an error:
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/re.pyc in findall(pattern, string, flags)
175
176 Empty matches are included in the result."""
--> 177 return _compile(pattern, flags).findall(string)
178
179 if sys.hexversion >= 0x02020000:
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/re.pyc in _compile(*key)
243 p = sre_compile.compile(pattern, flags)
244 except error, v:
--> 245 raise error, v # invalid expression
246 if len(_cache) >= _MAXCACHE:
247 _cache.clear()
error: internal: unsupported set operator
Why does it work that way and how to go around?
A
[..]expression is a character group, meaning it’ll match any one character contained therein. You are thus matching a literal$character. A character group always applies to one input character, and thus can never contain an anchor.If you wanted to match either a whitespace character or the end of the string, use a non-capturing group instead, combined with the
|or selector:Alternatively, look at the
\bword boundary anchor. It’ll match anywhere a\wgroup start or ends (so it anchors to points in the text where a\wcharacter is preceded or followed by a\Wcharacter, or is at the start or end of the string).