I’ve been studying about k-means clustering, and one thing that’s not clear is how you choose the value of k. Is it just a matter of trial and error, or is there more to it?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
You can maximize the Bayesian Information Criterion (BIC):
where
L(X | C)is the log-likelihood of the datasetXaccording to modelC,pis the number of parameters in the modelC, andnis the number of points in the dataset.See “X-means: extending K-means with efficient estimation of the number of clusters” by Dan Pelleg and Andrew Moore in ICML 2000.
Another approach is to start with a large value for
kand keep removing centroids (reducing k) until it no longer reduces the description length. See “MDL principle for robust vector quantisation” by Horst Bischof, Ales Leonardis, and Alexander Selb in Pattern Analysis and Applications vol. 2, p. 59-72, 1999.Finally, you can start with one cluster, then keep splitting clusters until the points assigned to each cluster have a Gaussian distribution. In “Learning the k in k-means” (NIPS 2003), Greg Hamerly and Charles Elkan show some evidence that this works better than BIC, and that BIC does not penalize the model’s complexity strongly enough.