I have next 2 blocks of code:
def replace_re(text):
start = time.time()
new_text = re.compile(r'(\n|\s{4})').sub('', text)
finish = time.time()
return finish - start
def replace_builtin(text):
start = time.time()
new_text = text.replace('\n', '').replace(' ', '')
finish = time.time()
return finish - start
Than I call both functions with text param (~500kb of source code of one web-page).
I thought replace_re() will be much faster, but results are the next:
replace_builtin()~ 0.008 secreplace_re()~ 0.035 sec (nearly 4.5 times slower!!!)
Why is that?
Rule of Thumb
Fixed-string matches should always faster than regular expression matches. You can Google for various benchmarks, or do what you did and perform your own, but in general you can just assume that this will be true except (possibly) in some unusual edge cases.
Why Regular Expressions are Slower
Why this is true has to do with the fact that fixed-string matches don’t have backtracking, compilation steps, ranges, character classes, or a host of other features that slow down the regular expression engine. There are certainly ways to optimize regex matches, but I think it’s unlikely to beat indexing into a string in the common case.
Use the Source
If you want a better explanation, you could always look at the source code for the relevant modules to see how they do what they do. That would certainly give you more information about why any particular implementation performs as it does.