MATLAB has a nice silhouette function to help evaluate the number of clusters for k-means. Is there an equivalent for Python’s Numpy/Scipy as well?
MATLAB has a nice silhouette function to help evaluate the number of clusters for
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
I present below a sample silhouette implementation in both MATLAB and Python/Numpy (keep in mind that I am more fluent in MATLAB):
1) MATLAB
To emulate the plot from the silhouette function in MATLAB, we group the silhouette values by cluster, sort within each, then plot the bars horizontally. MATLAB adds
NaNs to separate the bars from the different clusters, I found it easier to simply color-code the bars:2) Python
And here is what I came up with in Python:
Update:
As noted by others, scikit-learn has since then added its own silhouette metric implementation. To use it in the above code, replace the call to the custom-defined
silhouettefunction with:the rest of the code can still be used as-is to generate the exact same plot.