I have a large scipy.sparse.csc_matrix and would like to normalize it. That is subtract the column mean from each element and divide by the column standard deviation (std)i.
scipy.sparse.csc_matrix has a .mean() but is there an efficient way to compute the variance or std?
You can calculate the variance yourself using the mean, with the following formula:
E[X]stands for the mean. So to calculateE[X^2]you would have to square thecsc_matrixand then use themeanfunction. To get(E[X])^2you simply need to square the result of themeanfunction obtained using the normal input.