I am getting some very surprising results that seem to indicate that it’s more efficient to wrap an iterator in list and get it’s length compared to walking it with a lambda. How is this possible? Intuition would suggest that allocating all these lists would be slower.
And yes – I am aware that you can’t always do this as iterators can be infinite. 🙂
from itertools import groupby
from timeit import Timer
data = "abbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccac"
def rle_walk(gen):
ilen = lambda gen : sum(1 for x in gen)
return [(ch, ilen(ich)) for ch,ich in groupby(data)]
def rle_list(data):
return [(k, len(list(g))) for k,g in groupby(data)]
# randomy data
t = Timer('rle_walk("abbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccac")', "from __main__ import rle_walk; gc.enable()")
print t.timeit(1000)
t = Timer('rle_list("abbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccac")', "from __main__ import rle_list; gc.enable()")
print t.timeit(1000)
# chunky blocks
t = Timer('rle_walk("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbccccccccccccccccccccccccccccccccccccccccccccc")', "from __main__ import rle_walk; gc.enable()")
print t.timeit(1000)
t = Timer('rle_list("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbccccccccccccccccccccccccccccccccccccccccccccc")', "from __main__ import rle_list; gc.enable()")
print t.timeit(1000)
1.42423391342
0.145968914032
1.41816806793
0.0165541172028
Unfortunately your
rle_walkhas a bug; it takes parametergenbut should take parameterdata, so it’s operating on the wrong input. Also, it’s unfair to makerle_walkuse a lambda whererle_listworks inline. Rewriting like so:and testing:
gives
so we see that
walkis slightly slower thanliston the blocky data, but slightly faster on the random data. I’d guess the reason is that generators (in Python) impose an overhead compared to the list constructor; and the memory overhead of a 30-item list is too small to impose any significant penalty.Disassembling the functions provides a little insight:
The much larger code volume for the generator form is going to have some effect; while the list form has an O(log n) factor for constructing the throwaway list it’s going to be dominated by the k*O(n) factors in looping the various iterators. One thing to take away from this is that memory allocation is fast, at least for small (sub-page) allocations in a single-threaded environment (which CPython is by necessity of the GIL).