It was my understanding that np.apply_over_axis was a viable substitution for iterating over Numpy arrays, as doing it the python-way has bottlenecking that makes things slower; however, it seems that iterating takes ~9% the time apply_over_axis does! Piggybacking on this previous post, i decided to do a quick timing for myself:
import numpy as np
import timeit
def triv():
ial = [i for i in xrange(100)]
def super(fluous):
return fluous
>>> print (timeit.timeit("triv()", setup="from __main__ import triv"))
12.3305490909
>>> print (timeit.timeit("np.apply_along_axis(super, 0, np.arange(100))", setup="from __main__ import np, super"))
130.721563921
Why is this the case? I don’t really know the intricacies of timeit that well (or much of anything about timeit for that matter), but i think my examples are straightforward enough. I was wondering if anyone found a good workaround, as my real-world problem of iterating over the Cartesian-product of the rows in very large arrays is so slow it’s impeding progress.
Thanks in advance.
In the first place, your example is strange because
apply_over_axison a 1D array will just get passed the whole array as a single argument. It isn’t called repeatedly for each element. For that you’d wantvectorize.More generally, though, numpy can’t really speed up the application of arbitrary Python functions. The main advantage of numpy is that it provides its own implementations of lots of mathematical functions, and those are fast. It can’t magically take any function and just make it go faster.
In addition, your examples aren’t exactly parallel. You’re not calling the same function in both, for one thing. More specifically, your tests both include creating the input in the timed test — that is, you create
xrange(100)andnp.arange(100)inside the timed part of the test. So part of what you’re measuring is that it’s slower to create a Numpy array than to create anxrangeobject:That’s roughly a factor of 5 right there. But in a real application, you’d almost certainly already have the input array created, so this isn’t a realistic test.
Using parallel tests, I find that the plain-list version is only about twice as fast:
Moreover, the numpy version can be substantially faster if what you’re applying is a numpy function:
The moral of the story is that numpy really is NUMpy — it’s made for doing numerical calculations, and it has functions for doing them fast. It’s not just a thing that speeds up all your loops. If you just have big arrays of objects that you’re applying arbitrary functions to, numpy is unlikely to speed up your code, and may even slow it down. (It can still be very useful for non-numeric data because its facilities for things like complicated indexing into multidimensional arrays are convenient and may be faster than an equivalent Python structure using nested lists or the like. It’s just not useful for speeding up loops over these structures.)