I’m using the Trickl-Cluster project to cluster my data set
and Colt to memorize the data objects in matrices .
After executing this code
import cern.colt.matrix.DoubleMatrix2D;
import cern.colt.matrix.impl.DenseDoubleMatrix2D;
import com.trickl.cluster.KMeans;
DoubleMatrix2D dm1 = new DenseDoubleMatrix2D(3, 3);
dm1.setQuick(0, 0, 5.9);
dm1.setQuick(0, 1, 1.6);
dm1.setQuick(0, 2, 18.0);
dm1.setQuick(1, 0, 2.0);
dm1.setQuick(1, 1, 3.5);
dm1.setQuick(1, 2, 20.3);
dm1.setQuick(2, 0, 11.5);
dm1.setQuick(2, 1, 100.5);
dm1.setQuick(2, 2,6.5);
System.out.println (dm1);
KMeans km = new KMeans();
km.cluster(dm1 ,1);
DoubleMatrix2D dm11 = km.getPartition();
System.out.println (dm11);
DoubleMatrix2D dm111 = km.getMeans();
System.out.println (dm111);
I had the following output
3 x 3 matrix
5.9 1.6 18
2 3.5 20.3
11.5 100.5 6.5
3 x 1 matrix
1
1
1
3 x 1 matrix
6.466667
35.2
14.933333
Following the algorithm steps , it’s strange when one expects 1 cluster and has 3 means
The documentation is not so clear about that specific point .
This is the definition of the method Cluster according to the java doc of the project
void cluster(cern.colt.matrix.DoubleMatrix2D data, int clusters)
So logically speaking the int clusters represents the number of the expected clusters after K-means terminates.
Have you any idea about the relation between the outputs of K-means class in the project and the K-means algorithm expected results?
This is one 3-dimensional mean. If you put in three-dimensional data, you get out three-dimensional means.
Note that running k-means with k=1 is absolutely nonsensical, as it will simply compute the mean of the data set:
The result is obviously correct.