I’m trying to make a regex, that will return list words that begin with barbar in any case. It must return not the whole word, but only matching part. For example, from string
string = u'baRbarus, semibarbarus: qui BARbari sunt, alteres BARBARos non sequuntur!'
# output is...
>>> ['baRbar', 'BARbar', 'BARBAR']
I’ve tried such code:
re.compile(ur"([\A\b]*)(barbar)", re.UNICODE | re.IGNORECASE).findall(string)
# it returns...
[(u'', u'baRbar'), (u'', u'barbar'), (u'', u'BARbar'), (u'', u'BARBAR')]
It seems that I missunderstood something. Could you help me, please? And it will be also great if you advice some good tutorials about re module. It’s too hard to understand re from default Python’s documentation. Thanks!
The following regex is sufficient for what you want to do (as long as flags are set):
Example:
Here are some comments on your current regex which may clarify why
\bbarbardoes the job:[\A\b]–\Ais normally the start of string, and\bis word boundary, but inside of a character class\bbecomes a backspace and I’m not really sure what\Abecomes[\A\b]*– This is why your regex matched ‘semibarbarus’, the*means 0 or more so it doesn’t require that portion to match, if you dropped the*and fixed the above problem it would work([\A\b]*)(barbar)– Multiple groups mean thatre.findall()will return a tuple of the groups, rather than just the portion you are interested in