I want to center multi-dimensional data in a n x m matrix (<class 'numpy.matrixlib.defmatrix.matrix'>), let’s say X . I defined a new array ones(645), lets say centVector to produce the mean for every row in matrix X. And now I want to iterate every row in X, compute the mean and assign this value to the corresponding index in centVector. Isn’t this possible in a single row in scipy/numpy? I am not used to this language and think about something like:
centVector = ones(645)
for key, val in X:
centVector[key] = centVector[key] * (val.sum/val.size)
Afterwards I just need to subtract the mean in every Row:
X = X - centVector
How can I simplify this?
EDIT: And besides, the above code is not actually working – for a key-value loop I need something like enumerate(X). And I am not sure if X - centVector is returning the proper solution.
First, some example data:
numpy conveniently has a
meanfunction. By default however, it’ll give you the mean over all the values in the array. Since you want the mean of each row, you need to specify theaxisof the operation:Note that
axis=1says: find the mean along the columns (for each row), where 0 = rows and 1 = columns (and so on). Now, you can subtract this mean from yourX, as you did originally.Unsolicited advice
Usually, it’s best to avoid the matrix class (see docs). If you remove the
np.matrixcall from the example data, then you get a normal numpy array.Unfortunately, in this particular case, using an array slightly complicates things because
np.meanwill return a 1D array:If you try to subtract this from
X,r_meansgets broadcast to a row vector, instead of a column vector:So, you’ll have to reshape the 1D array into an
N x 1column vector:The
-1passed toreshapetells numpy to figure out this dimension based on the original array shape and the rest of the dimensions of the new array. Alternatively, you could have reshaped the array usingr_means[:, np.newaxis].