I am trying to match a pipe character in a string using a Python regex and I can’t seem to get it to match. I’ve boiled it down to a simplified version.
Let’s say I am looking for the sequence z|a in a string. Here are some possible regexes and the results:
>>> import re
>>> re.match(r'|', 'xyz|abc')
<_sre.SRE_Match object at 0x2d9a850>
>>> re.match(r'z|', 'xyz|abc')
<_sre.SRE_Match object at 0x2d9a780>
>>> re.match(r'|a', 'xyz|abc')
<_sre.SRE_Match object at 0x2d9a850>
>>> re.match(r'z|a', 'xyz|abc')
>>> re.match(r'z\|a', 'xyz|abc')
>>> re.match(r'z\\|a', 'xyz|abc')
>>> re.match(r'z\\\|a', 'xyz|abc')
>>> re.match(r'z[|]a', 'xyz|abc')
>>>
So I can match with |, |a and z| but I can’t find a way to match z|a. Any ideas?
re.match()is looking for a match at the start of the string. Usere.search()instead.The patterns you have that match are matching the empty string. i.e. r’|’ is empty string or empty string, r’z|’ is z or empty string and ‘|a’ is empty string or a. all of those will match on any string.
More generally you can use
re.escape()on a literal string that you need to include in the middle of a more complex regular expression to avoid having to figure out how many backslashes you need to unescape things.