I have a regex to match middle names that looks like this :
first_name = 'Matthew'
last_name = 'Walsh'
for char in first_name:
new_first_name+='(' + char.lower() + '|' + char.upper() + ')'
for char in last_name:
new_last_name+='(' + char.lower() + '|' + char.upper() + ')'
middle_name_regex_str = "\b?((" + new_first_name + " (?P<middle_name1>[A-Z][^ ]?[a-z]* )?" + new_last_name + ")|(" + new_last_name + " (?P<middle_name2>[A-Z][^ ]?[a-z]* )?" + new_first_name + "))"
here is a pattern it works for :
MATTHEW B. WALSH, D.M.D.\nBorn:\nAkron, Ohio\nCollege:\nBachelor of
Arts, Kenyon College, 1998
for this pattern it works fine, and matches the middle name ‘B.’
however I want to be on the safe side and escape the first and last names, but when I add re.escape it fails :
middle_name_regex_str = "\b?((" + re.escape(new_first_name) + " (?P<middle_name1>[A-Z][^ ]?[a-z]* )?" + re.escape(new_last_name) + ")|(" + re.escape(new_last_name) + " (?P<middle_name2>[A-Z][^ ]?[a-z]* )?" + re.escape(new_first_name) + "))"
and now the regex does not match properly :
regex = re.compile(middle_name_regex_str)
regex.search('MATTHEW B. WALSH, D.M.D.\nBorn:\nAkron, Ohio\nCollege:\nBachelor of Arts, Kenyon College, 1998')
this returns nothing.
Shouldn’t re.escape be safe to use in the sense that it does not alter the behavior of the expression? What in adding backslashes before non alphanumeric chars could cause it to not match?
Any help would be appreciated!
using re.escape on something that already contains regex special chars will look for those literal chars.
Two suggestions here:
if possible, why not use
re.IGNORECASEto test the regex regardless of case?if not, you can do something like this
first_name= 'Matthew'not sure about the order of formatting args here,but you get the point