Let’s assume I have two gmdistibution models that i obtained using
modeldata1=gmdistribution.fit(data1,1);
modeldata2=gmdistribution.fit(data2,1);
Now I have an unknown ‘data’ observation, and I want to see if it belongs to data1 or data2.
Based on my understanding of these functions, nlogn output using posterior,cluster, or pdf commands wouldn’t be a good measure since I am comparing ‘data’ to two different distributions.
What measure or output should I use find what is the p(data|modeldata1) and p(data|modeldata2) ?
Many thanks,
If I understand you correctly, you want to assign a new, unknown, datapoint to either class 1 or class 2 with the descriptors for each class (in this case the mean vector and covariance matrix) found by gmdistribution.fit.
In seeing this new datapoint, lets call it x, you should ask yourself what is
p(modeldata1 | x) and p(modeldata2 | x) and which ever one of these is the highest you should assign x to.
So how do you find these? You just apply Bayes rule and pick which ever one is the largest of:
Here you dont need to calculate p(x) as it is the same in each equation.
So, now you estimate the priors p(modeldata1) and p(modeldata2) by the number of training points from each class (or use some given information) and then calculate
where
dis the dimensionality of your data,Sigmais a corvariance matrix, andmuis a mean vector. This is then your asked for p(data|modeldata1). (Just remember to also use p(modeldata1) and p(modeldata2) when you do the classification).I know this was a bit unclear, but hopefully it can help you with a step in the right direction.
EDIT: Personally, I find a visualization such as the one below (takes from Pattern Recognition by Theodoridis and Koutroumbas). Here you have two gaussian mixtures with some priors and different covariance matrices. The blue area is where you would choose one class, while the gray area is where the other would be choosen.