Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6169801
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T22:55:09+00:00 2026-05-23T22:55:09+00:00

MATLAB has a nice silhouette function to help evaluate the number of clusters for

  • 0

MATLAB has a nice silhouette function to help evaluate the number of clusters for k-means. Is there an equivalent for Python’s Numpy/Scipy as well?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T22:55:10+00:00Added an answer on May 23, 2026 at 10:55 pm

    I present below a sample silhouette implementation in both MATLAB and Python/Numpy (keep in mind that I am more fluent in MATLAB):

    1) MATLAB

    function s = mySilhouette(X, IDX)
        %# X  : matrix of size N-by-p, data where rows are instances
        %# IDX: vector of size N, cluster index of each instance (starting from 1)
        %# s  : vector of size N, silhouette score value of each instance
    
        N = size(X,1);            %# number of instances
        K = numel(unique(IDX));   %# number of clusters
    
        %# compute pairwise distance matrix
        D = squareform( pdist(X,'euclidean').^2 );
    
        %# indices belonging to each cluster
        kIndices = accumarray(IDX, 1:N, [K 1], @(x){sort(x)});
    
        %# compute a,b,s for each instance
        %# a(i): average distance from i to all other data within the same cluster.
        %# b(i): lowest average dist from i to the data of another single cluster
        a = zeros(N,1);
        b = zeros(N,1);
        for i=1:N
            ind = kIndices{IDX(i)}; ind = ind(ind~=i);
            a(i) = mean( D(i,ind) );
            b(i) = min( cellfun(@(ind) mean(D(i,ind)), kIndices([1:K]~=IDX(i))) );
        end
        s = (b-a) ./ max(a,b);
    end
    

    To emulate the plot from the silhouette function in MATLAB, we group the silhouette values by cluster, sort within each, then plot the bars horizontally. MATLAB adds NaNs to separate the bars from the different clusters, I found it easier to simply color-code the bars:

    %# sample data
    load fisheriris
    X = meas;
    N = size(X,1);
    
    %# cluster and compute silhouette score
    K = 3;
    [IDX,C] = kmeans(X, K, 'distance','sqEuclidean');
    s = mySilhouette(X, IDX);
    
    %# plot
    [~,ord] = sortrows([IDX s],[1 -2]);
    indices = accumarray(IDX(ord), 1:N, [K 1], @(x){sort(x)});
    ytick = cellfun(@(ind) (min(ind)+max(ind))/2, indices);
    ytickLabels = num2str((1:K)','%d');           %#'
    
    h = barh(1:N, s(ord),'hist');
    set(h, 'EdgeColor','none', 'CData',IDX(ord))
    set(gca, 'CLim',[1 K], 'CLimMode','manual')
    set(gca, 'YDir','reverse', 'YTick',ytick, 'YTickLabel',ytickLabels)
    xlabel('Silhouette Value'), ylabel('Cluster')
    
    %# compare against SILHOUETTE
    figure, silhouette(X,IDX)
    

    mySilhouette
    silhouette


    2) Python

    And here is what I came up with in Python:

    import numpy as np
    from scipy.cluster.vq import kmeans2
    from scipy.spatial.distance import pdist, squareform
    from sklearn import datasets
    import matplotlib.pyplot as plt
    from matplotlib import cm
    
    def silhouette(X, cIDX):
        """
        Computes the silhouette score for each instance of a clustered dataset,
        which is defined as:
            s(i) = (b(i)-a(i)) / max{a(i),b(i)}
        with:
            -1 <= s(i) <= 1
    
        Args:
            X    : A M-by-N array of M observations in N dimensions
            cIDX : array of len M containing cluster indices (starting from zero)
    
        Returns:
            s    : silhouette value of each observation
        """
    
        N = X.shape[0]              # number of instances
        K = len(np.unique(cIDX))    # number of clusters
    
        # compute pairwise distance matrix
        D = squareform(pdist(X))
    
        # indices belonging to each cluster
        kIndices = [np.flatnonzero(cIDX==k) for k in range(K)]
    
        # compute a,b,s for each instance
        a = np.zeros(N)
        b = np.zeros(N)
        for i in range(N):
            # instances in same cluster other than instance itself
            a[i] = np.mean( [D[i][ind] for ind in kIndices[cIDX[i]] if ind!=i] )
            # instances in other clusters, one cluster at a time
            b[i] = np.min( [np.mean(D[i][ind]) 
                            for k,ind in enumerate(kIndices) if cIDX[i]!=k] )
        s = (b-a)/np.maximum(a,b)
    
        return s
    
    def main():
        # load Iris dataset
        data = datasets.load_iris()
        X = data['data']
    
        # cluster and compute silhouette score
        K = 3
        C, cIDX = kmeans2(X, K)
        s = silhouette(X, cIDX)
    
        # plot
        order = np.lexsort((-s,cIDX))
        indices = [np.flatnonzero(cIDX[order]==k) for k in range(K)]
        ytick = [(np.max(ind)+np.min(ind))/2 for ind in indices]
        ytickLabels = ["%d" % x for x in range(K)]
        cmap = cm.jet( np.linspace(0,1,K) ).tolist()
        clr = [cmap[i] for i in cIDX[order]]
    
        fig = plt.figure()
        ax = fig.add_subplot(111)
        ax.barh(range(X.shape[0]), s[order], height=1.0, 
                edgecolor='none', color=clr)
        ax.set_ylim(ax.get_ylim()[::-1])
        plt.yticks(ytick, ytickLabels)
        plt.xlabel('Silhouette Value')
        plt.ylabel('Cluster')
        plt.show()
    
    if __name__ == '__main__':
        main()
    

    python_mySilhouette


    Update:

    As noted by others, scikit-learn has since then added its own silhouette metric implementation. To use it in the above code, replace the call to the custom-defined silhouette function with:

    from sklearn.metrics import silhouette_samples
    
    ...
    
    #s = silhouette(X, cIDX)
    s = silhouette_samples(X, cIDX)    # <-- scikit-learn function
    
    ...
    

    the rest of the code can still be used as-is to generate the exact same plot.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I know Matlab has a function called cylinder to create the points for a
In matlab there is a special function which is not available in any of
Is there an idiomatic way to bind variables in a MATLAB function? It seems
Matlab has a nice property that scalar functions (such as sin) can work on
I have a function in Matlab that has some variables in it. I need
I have never used Python before, most of my programming has been in MATLAB
What is the equivalent of MATLAB's quadprog function in R in terms of function
I have a matlab function that returns results in a uitable. There are 2
Matlab's cross-correlation function xcorr(x,y,maxlags) has an option maxlag , which returns the cross-correlation sequence
This question kind of starts where this question ends up. MATLAB has a powerful

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.