I’m trying to execute this code :
import re
pattern = r"(\w+)\*([\w\s]+)*/$"
re_compiled = re.compile(pattern)
results = re_compiled.search('COPRO*HORIZON 2000 HOR')
print(results.groups())
But Python does not respond. The process takes 100% of the CPU and does not stop. I’ve tried this both on Python 2.7.1 and Python 3.2 with identical results.
Your regex runs into catastrophic backtracking because you have nested quantifiers (
([...]+)*). Since your regex requires the string to end in/(which fails on your example), the regex engine tries all permutations of the string in the vain hope to find a matching combination. That’s where it gets stuck.To illustrate, let’s assume
"A*BCD"as the input to your regex and see what happens:(\w+)matchesA. Good.\*matches*. Yay.[\w\s]+matchesBCD. OK./fails to match (no characters left to match). OK, let’s back up one character./fails to matchD. Hum. Let’s back up some more.[\w\s]+matchesBC, and the repeated[\w\s]+matchesD./fails to match. Back up./fails to matchD. Back up some more.[\w\s]+matchesB, and the repeated[\w\s]+matchesCD./fails to match. Back up again./fails to matchD. Back up some more, again.[\w\s]+matchesB, repeated[\w\s]+matchesC, repeated[\w\s]+matchesD? No? Let’s try something else.[\w\s]+matchesBC. Let’s stop here and see what happens./still doesn’t matchD.[\w\s]+matchesB./doesn’t matchC.(...)*./still doesn’t matchB.Now that was a string of just three letters. Yours had about 30, trying all permutations of which would keep your computer busy until the end of days.
I suppose what you’re trying to do is to get the strings before/after
*, in which case, use