The problem comes as follows. I have M images and extract N features for each image, and the dimensionality of each feature is L. Thus, I have M*N features (2,000,000 for my case) and each feature has L dimensionality (100 for my case). I need to cluster these M*N features into K clusters. How can I do it? Thanks.
Share
Do you want 1000 clusters of images, or of features, or of (image, feature) pairs ?
In any case, it sounds as though you’ll have to reduce the data
and use simpler methods.
One possibility is two-pass K-cluster:
a) split the 2 million data points into 32 clusters,
b) split each of these into 32 more.
If this works, the resulting 32^2 = 1024 clusters might be good enough for your purpose.
Then, do you really need 100 coordinates ?
Could you guess the 20 most important ones,
or just try random subsets of 20 ?
There’s a huge literature: Google
+image "dimension reduction"gives ~ 70000 hits.