I’ve been searching everywhere and I’ve only found how to create a covariance matrix from one vector to another vector, like cov(xi, xj). One thing I’m confused about is, how to get a covariance matrix from a cluster. Each cluster has many vectors. how to get them into one covariance matrix. Any suggestions??
info :
input : vectors in a cluster, Xi = (x0,x1,…,xt), x0 = { 5 1 2 3 4} –> a column vector
(actually it’s an MFCC feature vector which has 12 coefficients per vector, after clustering them with k-means, 8 cluster, now i want to get the covariance matrix for each cluster to use it as the covariance matrix in Gaussian Mixture Model)
output : covariance matrix n x n
The question you are asking is: Given a set of N points of dimension D (e.g. the points you initially clustered as “speaker1”), fit a D-dimensional gaussian to those points (which we will call “the gaussian which represents speaker1”). To do so, merely calculate the sample mean and sample covariance: http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Estimation_of_parameters or http://en.wikipedia.org/wiki/Sample_mean_and_covariance
Repeat for the other k=8 speakers. I believe you may be able to use a “non-parametric” stochastic process, or modify the algorithm (e.g. run it a few times on many speakers), to remove your assumption of k=8 speakers. Note that the standard k-means clustering algorithms (and other common algorithms like EM) are very fickle in that they will give you different answers depending on how you initialize, so you may wish to perform appropriate regularization to penalize “bad” solutions as you discover them.
(below is my answer before you clarified your question)
covariance is a property of two random variables, which is a rough measure of how much changing one affects the other
a covariance matrix is merely a representation for the NxM separate covariances,
cov(x_i,y_j), each element from the set X=(x1,x2,…,xN) and Y=(y1,y2,…,yN)So the question boils down to, what you are actually trying to do with this “covariance matrix” you are searching for? Mel-Frequency Cepstral Coefficients… does each coefficient correspond to each note of an octave? You have chosen
k=12as the number of clusters you’d like? Are you basically trying to pick out notes in music?I’m not sure how covariance generalizes to vectors, but I would guess that the covariance between two vectors x and y is just
E[x dot y] - (E[x] dot E[y])(basically replace multiplication with dot product) which would give you a scalar, one scalar per element of your covariance matrix. Then you would just stick this process inside two for-loops.Or perhaps you could find the covariance matrix for each dimension separately. Without knowing exactly what you’re doing though, one cannot give further advice than that.