Okay, so I probably shouldn’t be worrying about this anyway, but I’ve got some code that is meant to pass a (possibly very long, possibly very short) list of possibilities through a set of filters and maps and other things, and I want to know if my implementation will perform well.
As an example of the type of thing I want to do, consider this chain of operations:
- get all numbers from 1 to 100
- keep only the even ones
- square each number
- generate all pairs [i, j] with i in the list above and j in [1, 2, 3, 4,5]
- keep only the pairs where i + j > 40
Now, after doing all this nonsense, I want to look through this set of pairs [i, j] for a pair which satisfies a certain condition. Usually, the solution is one of the first entries, in which case I don’t even look at any of the others. Sometimes, however, I have to consume the entire list, and I don’t find the answer and have to throw an error.
I want to implement my “chain of operations” as a sequence of generators, i.e., each operation iterates through the items generated by the previous generator and “yields” its own output item by item (a la SICP streams). That way, if I never look at the last 300 entries of the output, they don’t even get processed. I known that itertools provides things like imap and ifilter for doing many of the types of operations I would want to perform.
My question is: will a series of nested generators be a major performance hit in the cases where I do have to iterate through all possibilities?
I tried two implementations, one using generators and one without generators. I tested it in 2.7 so
rangereturns a list rather than an iterator.Here is the implementations
Using Generators
Without Generators
Mixing Both so as not to append a list
Creating Temporary Lists
Here are my results
Conclusion:
Generator expressions are powerful and you can optimize it to a much greater extend. As you can see in the example
foo2, which is the slowest it had the hard time appending a single list which killed the performance.foo3andfoo4has almost the same time so it seems creating a temporary list was not a bottleneck, as it was only created once in the whole iteration. Without generators you would soon end up with some performance issues like appending a list or creating temporary lists. So lazy evaluation came to the picture to give an edge over these performance bottlenecks.