I’m working on a program that takes in several (<50) high dimension points in feature space (1000+ dimensions) and performing hierarchical clustering on them by recursively using standard k-clustering.
My problem is that in any one k-clustering pass, different parts of the high dimensional representation are redundant. I know this problem follows under the umbrella of either feature extraction, selection, or weighting.
In general, what does one take into account when selecting a particular feature extraction/selection/weighting algorithm? And specifically, what algorithm would be the best way to prepare my data to clustering in my situation?
Check out this paper:
Witten DM and R Tibshirani (2010) A framework for feature selection in clustering. Journal of the American Statistical Association 105(490): 713-726.
And the related paper COSA by Friedman. They both discuss these issues in depth.