I need to calculate standard deviation and other stats on a large multidimensional ndarray

Question

0

Asked: May 31, 20262026-05-31T11:56:58+00:00 2026-05-31T11:56:58+00:00

I need to calculate standard deviation and other stats on a large multidimensional ndarray

0

I need to calculate standard deviation and other stats on a large multidimensional ndarray of gridded point data. Example:

import numpy as np
# ... gridded data are read into g1, g2, g3 arrays ...
allg = numpy.array( [g1, g2, g3] )
allmg = numpy.ma.masked_values(allg, -99.)
sd = numpy.zeros((3, 3315, 8325))
np.std(allmg, axis=0, ddof=1, out=sd)

I’ve seen the performance advantages of wrapping numpy calculations in numexpr.evaluate() on various websites but I don’t think there’s a way to run np.std() in numexpr.evaluate() (correct me if I’m wrong). Are there any other ways I can optimize the np.std() call? It currently takes about 18 sec to calculate on my system…hoping to make that much faster somehow…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T11:57:00+00:00

Maybe you can use multiprocessing to do the calculation in several process. But before trying that, you can try to rearrange your data so that you can call std() for the last axis. Here is an example:

import numpy as np
import time
data = np.random.random((4000, 4000))

start = time.clock()
np.std(data, axis=0)
print time.clock() - start

start = time.clock()
np.std(data, axis=1)
print time.clock() - start

the result on my pc is :

0.511926329834
0.273098421142

since all the data are in continuous memory for the last axis, data access will use CPU cache more effectively.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to calculate standard deviation and other stats on a large multidimensional ndarray

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply