So basically, I use the Python module scipy-cluster to plot a lot of data points. Is there are way/function that give the representative of each cluster if given the threshold, or the number of representatives I want? Ideally, each representative must has the closest distance to the center of the cluster it belongs to.
Edit: I’m looking for the data point closest to the centroid in each cluster.
Scipy-cluster provides coordinates for each centroid and identifies which points are in each cluster. Once you have that, I believe
scipy.cluster.vq.py_vqwill give you the distance between observations and centroids.