I need to generate a lot of random numbers. I’ve tried using random.random but this function is quite slow. Therefore I switched to numpy.random.random which is way faster! So far so good. The generated random numbers are actually used to calculate some thing (based on the number). I therefore enumerate over each number and replace the value. This seems to kill all my previously gained speedup. Here are the stats generated with timeit():
test_random - no enumerate
0.133111953735
test_np_random - no enumerate
0.0177130699158
test_random - enumerate
0.269361019135
test_np_random - enumerate
1.22525310516
as you can see, generating the number is almost 10 times faster using numpy, but enumerating over those numbers gives me equal run times.
Below is the code that I’m using:
import numpy as np
import timeit
import random
NBR_TIMES = 10
NBR_ELEMENTS = 100000
def test_random(do_enumerate=False):
y = [random.random() for i in range(NBR_ELEMENTS)]
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item
def test_np_random(do_enumerate=False):
y = np.random.random(NBR_ELEMENTS)
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item
if __name__ == '__main__':
from timeit import Timer
t = Timer("test_random()", "from __main__ import test_random")
print "test_random - no enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_np_random()", "from __main__ import test_np_random")
print "test_np_random - no enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_random(True)", "from __main__ import test_random")
print "test_random - enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_np_random(True)", "from __main__ import test_np_random")
print "test_np_random - enumerate"
print t.timeit(NBR_TIMES)
What’s the best way to speed this up and why does enumerate slow things down so dramatically?
EDIT: the reason I use enumerate is because I need both the index and the value of the current element.
To take full advantage of numpy’s speed, you want to create ufuncs whenever possible. Applying
vectorizeto a function as mgibsonbr suggests is one way to do that, but a better way, if possible, is simply to construct a function that takes advantage of numpy’s built-in ufuncs. So something like this:What is the nature of the function you want to apply across the numpy array? If you tell us, perhaps we can help you come up with a version that uses only numpy ufuncs.
It’s also possible to generate an array of indices without using
enumerate. Numpy providesndenumerate, which is an iterator, and probably slower, but it also providesindices, which is a very quick way to generate the indices corresponding to the values in an array. So…So to be more explicit, you can use the above and combine them using
numpy.rec.fromarrays:It’s starting to sound like your main concern is performing the operation in-place. That’s harder to do using
vectorizebut it’s easy with the ufunc approach:As you can see, numpy performs these operations in-place.