General question: is there a preferably style to building up a list, in terms of efficiency, assuming that you have to do it within a loop? For example, is one of these options preferable to building a list of integers:
mylist = []
for x, y in mystuff:
# x, y are strings that need to be
# added sequentially to list
mylist.extend([int(x), int(y)])
versus
for x, y in mystuff:
mylist.append(int(x))
mylist.append(int(y))
Or any others? Open to uses of scipy/numpy for this if relevant. thanks.
If you need to micro-optimize like this, the only way to know what’s fastest is to test.
The short version is:
appendis faster thanextend, and Joran Beasley’s suggestionitertools.chain.from_iterableis slightly faster than either—but only if you replace themapwith a list comprehension.So:
For Python 2, I tried the
mapversions without the extralist, and it was slightly faster, but still not nearly competitive. For Python 3, of course, thelistis necessary.Here are my timings:
My
pythonis Apple’s stock Python 2.7.2, whilepython3is the python.org 3.3.0, both 64-bit, both on OS X 10.8.2, on a mid-2012 MacBook Pro with a 2.2GHz i7 and 4GB.If you’re using 32-bit Python on a POSIX platform, I’ve noticed in the past that somewhere in the not-too-distant past, iterators got an optimization that seems to have sped up many things in
itertoolsin 64-bit builds, but slowed them down in 32-bit. So, you may find thatappendwins in that case. (As always, test on the platform(s) you actually care about optimizing.)Ashwini Chaudhary linked to Flattening a shallow list in Python, which further linked to finding elements in python association lists efficiently. I suspect part of the difference between my results and theirs was improvements in iterators between 2.6.0 and 2.7.2/3.3.0, but the fact that we’re explicitly using 2-element elements instead of larger ones is probably even more importantly.
Also, at least one of the answers claimed that
reducewas the fastest. Thereduceimplementations in the original post are all terribly slow, but I was able to come up with faster versions. They still aren’t competitive withappendorchain.from_iterable, but they’re in the right ballpark.The
f_numpyfunction is heltonbiker’s implementation. Sincemystuffis a 2D iterator, this actually just generates a 0D array wrapping the iterator, so allnumpycan do is add overhead. I was able to come up with an implementation that generates a 1D array of iterators, but that was even slower, because now allnumpycan do is add overhead N times as often. The only way I could get a 2D array of integers was by callinglistfirst, as inf_numpy2, which made things even slower. (To be fair, throwing an extralistinto the other functions slowed them down too, but not nearly as bad as withnumpy.)However, it’s quite possible that I’m blanking here, and there is a reasonable way to use
numpyhere. Of course if you can be sure either the top levelmystuffor each element inmystuffis alistor atuple, you can write something better—and if you can redesign your app so you have a 2Dnumpy.arrayin the first place, instead of a general sequence of sequences, that’ll be a whole different story. But if you just have a general 2D iteration of sequences, it doesn’t seem very good for this use case.