I need to read gigabytes of text so I’m trying to optimize my code. When doing this I found that, for my problem, using a dictionary is faster than if-tests.
check = {'R':'-', 'F':'+'}
seqs = ['R', 'F']*100
def check1():
for entry in seqs:
if entry == 'R':
strand = '-'
if entry == 'F':
strand = '+'
def check2():
for entry in seqs:
strand = check[entry]
Using ipythong’s %timeit I see that looking up in a dictionary is slightly more than twice as fast as using two if-tests:
In [63]: %timeit check1()
10000 loops, best of 3: 38.8 us per loop
In [64]: %timeit check2()
100000 loops, best of 3: 16.2 us per loop
Since if-tests are so basic, I did not expect a performance difference. Is this well known? Can anybody explain why this is so?
UPDATE
I checked how the two functions above as well as check3() below affect the runtime of my actual code, and there’s no effect on the total time. So either the boost gotten with the dictionary is not so high in a real-world example where the ‘R’ and ‘F’ values need to be re-read from file constantly, or this piece of code is just not part of my bottleneck.
Anyway thanks for the answers!
As with a lot of VM code, it mostly comes down to the number of VM opcodes involved.
You can examine the assembled functions with
dis:In 2.6.4, check1 takes around 15-20 opcodes (depending on the code path), for each comparison and branch. check2 takes just 7 (after adding the missing
chedictdictionary, declared globally).