Can I eliminate all Python loops in this computation:
result[i,j,k] = (x[i] * y[j] * z[k]).sum()
where x[i], y[j], z[k] are vectors of length N and x,y,z have first dimensions with length A,B,C s.t. output is shape (A,B,C) and each element is
the sum of a triple-product (element-wise).
I can get it down from 3 to 1 loops (code below), but am stuck trying to
eliminate the last loop.
If necessary I could make A=B=C (via small amount of padding).
# Example with 3 loops, 2 loops, 1 loop (testing omitted)
N = 100 # more like 100k in real problem
A = 2 # more like 20 in real problem
B = 3 # more like 20 in real problem
C = 4 # more like 20 in real problem
import numpy
x = numpy.random.rand(A, N)
y = numpy.random.rand(B, N)
z = numpy.random.rand(C, N)
# outputs of each variant
result_slow = numpy.empty((A,B,C))
result_vec_C = numpy.empty((A,B,C))
result_vec_CB = numpy.empty((A,B,C))
# 3 nested loops
for i in range(A):
for j in range(B):
for k in range(C):
result_slow[i,j,k] = (x[i] * y[j] * z[k]).sum()
# vectorize loop over C (2 nested loops)
for i in range(A):
for j in range(B):
result_vec_C[i,j,:] = (x[i] * y[j] * z).sum(axis=1)
# vectorize one C and B (one loop)
for i in range(A):
result_vec_CB[i,:,:] = numpy.dot(x[i] * y, z.transpose())
numpy.testing.assert_almost_equal(result_slow, result_vec_C)
numpy.testing.assert_almost_equal(result_slow, result_vec_CB)
If you are using numpy > 1.6, there is the awesome
np.einsumfunction:Which is equivalent to your looped versions. I’m not sure how this will fair on efficiency once you get up to the size of your arrays in the real problem (I’m actually getting a segfault on my machine, when I move to those sizes). The other solution which I often prefer for these sorts of problems is re-writing the method using cython.