I’m trying to write a function which will search str for a substr, taking into account different possibilities to write weird letters, such as æ, ø, å in danish language. For example you can search ‘Ålborg’ and the function will return true if there is, say ‘Aalborg’ in the str.
The function below works, but performance is unbearable. What would you recommend to improve performance?
def danish_tolerating_search(substr, str):
'''Figure out if substr is in str, taking into account
possible deviations in writing letters æ, ø, å.
æ <-> ae a ea
ø <-> oe o
å <-> aa a o
'''
# normalize input
substr = substr.lower().replace('aa',u'å')
str = str.lower()
# normalized recursive search
# TODO fix perfomance
def s(substr, str):
if str.find(substr) >= 0: return True
if substr.find(u'æ') >= 0:
if s(substr.replace(u'æ','ae', 1), str): return True
elif s(substr.replace(u'æ', 'a', 1), str): return True
elif s(substr.replace(u'æ','ea', 1), str): return True
if str.find(u'æ') >= 0:
if s(substr, str.replace(u'æ','ae', 1)): return True
elif s(substr, str.replace(u'æ', 'a', 1)): return True
elif s(substr, str.replace(u'æ','ea', 1)): return True
if substr.find(u'ø') >= 0:
if s(substr.replace(u'ø','oe', 1), str): return True
elif s(substr.replace(u'ø', 'o', 1), str): return True
if str.find(u'ø') >= 0:
if s(substr, str.replace(u'ø','oe', 1)): return True
elif s(substr, str.replace(u'ø', 'o', 1)): return True
if substr.find(u'å') >= 0:
if s(substr.replace(u'å','aa', 1), str): return True
elif s(substr.replace(u'å', 'a', 1), str): return True
elif s(substr.replace(u'å', 'o', 1), str): return True
if str.find(u'å') >= 0:
if s(substr, str.replace(u'å','aa', 1)): return True
elif s(substr, str.replace(u'å', 'a', 1)): return True
elif s(substr, str.replace(u'å', 'o', 1)): return True
return False
return s(substr, str)
I think you should eliminate the recursion altogether. Instead of doing all that
findandreplace, you could, for example, decide upon a “normal form” of your input strings, convert them accordingly (i.e. replace those “ambiguous” characters) and do a simpleNote also that you don’t need to call
findandreplacetogether, the latter sufficies. If the search string isn’t found, replace just won’t replace anything.