When iterating over a large array with a range expression, should I use Python’s built-in range function, or numpy’s arange to get the best performance?
My reasoning so far:
range probably resorts to a native implementation and might be faster therefore. On the other hand, arange returns a full array, which occupies memory, so there might be an overhead. Python 3’s range expression is a generator, which does not hold all the values in memory.
For large arrays, a vectorised numpy operation is the fastest. If you must loop, prefer
xrange/rangeand avoid usingnp.arange.In numpy you should use combinations of vectorized calculations, ufuncs and indexing to solve your problems as it runs at
Cspeed.Looping over numpy arrays is inefficient compared to this.
(Something like the worst thing you could do would be to iterate over the array with an index created with
rangeornp.arangeas the first sentence in your question suggests, but I’m not sure if you really mean that.)So for this case numpy is 4 times faster than using
xrangeif you do it right. Depending on your problem numpy can be much faster than a 4 or 5 times speed up.The answers to this question explain some more advantages of using numpy arrays instead of python lists for large data sets.