Below are two simple Cython methods I wrote. In g_cython() method I used additional typing for numpy array a and b, but surprisingly g_cython() is twice slower than g_less_cython(). I wonder why is this happening? I thought adding that would make indexing on a and b much faster?
PS. I understand both functions can be vectorized in numpy — I am just exploring cython optimization tricks.
import numpy as np;
cimport numpy as np;
def g_cython(np.ndarray[np.int_t, ndim = 1] a, percentile):
cdef int i
cdef int n = len(a)
cdef np.ndarray[np.int_t, ndim = 1] b = np.zeros(n, dtype = 'int')
for i in xrange(n):
b[i] = np.searchsorted(percentile, a[i])
return b
def g_less_cython(a, percentile):
cdef int i
b = np.zeros_like(a)
for i in xrange(len(a)):
b[i] = np.searchsorted(percentile, a[i])
return b
my test case is when len(a) == 1000000 and len(percentile) = 100
def main3():
n = 100000
a = np.random.random_integers(0,10000000,n)
per = np.linspace(0, 10000000, 101)
q = time.time()
b = g_cython(a, per)
q = time.time() - q
print q
q = time.time()
bb = g_less_cython(a, per)
q = time.time() - q
print q
I tested you code, g_cython is a slightly faster than g_less_cython.
here is the test code
the output is:
I turned off the boundscheck and wraparound flag:
The difference is not notable because the call of np.searchsorted(percentile, a[i]) is the critical part that used most of CPU.