I’m trying to evaluate what is the right number of cluster needed for clusterize some data.
I know that this is possible using Davies–Bouldin Index (DBI).
To using DBI you have to compute it for any number of cluster and the one that minimize the DBI corresponds to the right number of cluster needed.
The question is:
how to know if 2 clusters are better than 1 cluster using DBI? So, how can I compute DBI when I have just 1 cluster?
Only considering the average
DBIof all clusters apparently is not a good idea.Certainly, increasing the number of clusters –
k, without penalty, will always reduce the amount of DBI in the resulting clustering, to the extreme case of zeroDBIif each data point is considered its own cluster (because each data point overlaps with its own centroid).So it’s hard to say which one is better if you only use the average
DBIas the performance metric.A good practical method is to use the Elbow method.
Some other good alternatives with respective to choosing the optimal number of clusters: