Good afternoon.
I am faced with a PCA task which simply involves reducing the dimensionality of a vector. I’m not interested in a two-dimensional matrix in this case, but merely a D-dimensional vector which I would like to project along it’s K principal eigenvectors.
In order to implement PCA, I need to retrieve the covariance matrix of this vector. Let’s try to do this on an example vector:
someVec = np.array([[1.0, 1.0, 2.0, -1.0]])
I’ve defined this vector as a 1 X 4 matrix, i.e a row vector, in order to make it compatible with numpy.cov. Taking the covariance matrix of this vector through numpy.cov will yield a scalar covariance matrix, because numpy.cov makes the assumption that the features are in the rows:
print np.cov(someVec)
1.58333333333
but this is (or rather, should be) merely a difference in dimensionality assumptions, and taking the covariance of the transpose vector should work fine, right? Except that it doesn’t:
print np.cov(someVec.T)
/usr/lib/python2.7/site-packages/numpy/lib/function_base.py:2005: RuntimeWarning:
invalid value encountered in divide
return (dot(X, X.T.conj()) / fact).squeeze()
[[ nan nan nan nan]
[ nan nan nan nan]
[ nan nan nan nan]
[ nan nan nan nan]]
I’m not exactly sure what I’ve done wrong here. Any advice?
Thanks,
Jason
If you want to pass in the transpose, you’ll need to set
rowvarto zero.From the docs:
If you want to find a full covariance matrix, you’ll need more than one observation. With a single observation, and numpy’s default estimator,
NaNis exactly what you’d expect. If you would like to have normalization done byNinstead of(N-1), you can pass in a1to the bias.Again, from the docs.