I was testing the speeds of a few different ways to do complex iterations over some of my data, and I found something weird. It seems that having a large list local to some function slows down that function considerably, even if it is not touching that list. For example, creating 2 independent lists via 2 instances of the same generator function is about 2.5x slower the second time. If the first list is removed prior to creating the second, both iterators go at the same spee.
def f():
l1, l2 = [], []
for c1, c2 in generatorFxn():
l1.append((c1, c2))
# destroying l1 here fixes the problem
for c3, c4 in generatorFxn():
l2.append((c3, c4))
The lists end up about 3.1 million items long each, but I saw the same effect with smaller lists too. The first for loop takes about 4.5 seconds to run, the second takes 10.5. If I insert l1= [] or l1= len(l1) at the comment position, both for loops take 4.5 seconds.
Why does the speed of local memory allocation in a function have anything to do with the current size of that function’s variables?
EDIT:
Disabling the garbage collector fixes everything, so must be due to it running constantly. Case closed!
When you create that many new objects (3 million tuples), the garbage collector gets bogged down. If you turn off garbage collection with gc.disable(), the issue goes away (and the program runs 4x faster to boot).