I have a function that returns True if a string matches at least one
regular expression in a list and False otherwise. The function is called
often enough that performance is an issue.
When running it through cProfile, the function is spending about 65% of
its time doing matches and 35% of its time iterating over the list.
I would think there would be a way to use map() or something but I can’t
think of a way to have it stop iterating after it finds a match.
Is there a way to make the function faster while still having it return
upon finding the first match?
def matches_pattern(str, patterns):
for pattern in patterns:
if pattern.match(str):
return True
return False
The first thing that comes to mind is pushing the loop to the C side by using a generator expression:
Probably you don’t even need a separate function for that.
Another thing you should try out is to build a single, composite regex using the
|alternation operator, so that the engine has a chance to optimize it for you. You can also create the regex dynamically from a list of string patterns, if this is necessary:Of course you need to have your regexes in string form for that to work. Just profile both of these and check which one is faster 🙂
You might also want to have a look at a general tip for debugging regular expressions in Python. This can also help to find opportunities to optimize.
UPDATE: I was curious and wrote a little benchmark:
The output on my machine:
So
anydoesn’t seem to be faster than your original approach. Building up a regex dynamically also isn’t really fast. But if you can manage to build up a regex upfront and use it several times, this might result in better performance. You can also adapt this benchmark to test some other options 🙂