I am implementing a PCA algorithm in MATLAB. I see two different approaches to calculating the covariance matrix:
C = sampleMat.' * sampleMat ./ nSamples;
and
C = cov(data);
What is the difference between these two methods?
PS 1: When I use cov(data) is that unnecessary:
meanSample = mean(data,1);
data = data - repmat(data, nSamples, 1);
PS 2:
At first approach should I use nSamples or nSamples - 1?
In short:
covmainly just adds convenience to the bare formula.If you type
You’ll see a lot of stuff, with these lines all the way at the bottom:
which is essentially the same as your first line, save for the subtraction of the column-means.
Read the wiki on sample covariances to see why there is a minus-one in the default path.
Note however that your first line uses normal transpose (
.'), whereas thecov-version uses conjugate-transpose ('). This will make the output ofcovdifferent in the context of complex-valued data.Also note that
covis a function call to a non-built in function. That means that there will be a (possibly severe) performance penalty when usingcovin a loop; Matlab’s JIT compiler cannot accelerate non-built in functions.