I’m trying to solve the bottleneck in my application, which is an elementwise sum of two matrices.
I’m using NumPy and Cython. I have a cdef class with a matrix attribute. Since Cython still doesn’t support buffer arrays in class attributes, I followed this and tried to use a pointer to the data attribute of the matrix. The thing is, I’m sure I’m doing something wrong, as the results indicate.
What I tried to do is more or less the following:
cdef class the_class:
cdef np.ndarray the_matrix
cdef float_t* the_matrix_p
def __init__(self):
the_matrix_p = <float_t*> self.the_matrix.data
cpdef the_function(self):
other_matrix = self.get_other_matrix()
the_matrix_p += other_matrix.data
I have serious doubt that adding two numpy arrays is a bottleneck that you can solve rewriting things in C. See the follwing code, that uses
scipy.weave:Once you run
c_sum(a, b, c)once to get the C code compiled, these are the timings I get:So it seems you are looking at something of a .3% performance improvement, if the timing differences are not simply random noise, on an operation that takes a handful of ms when working on arrays of ten million elements. If it really is a bottleneck, this is hardly going to solve it.