Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6960551
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T15:24:59+00:00 2026-05-27T15:24:59+00:00

Possible Duplicate: Python k-means algorithm I want to cluster 10000 indexed points based on

  • 0

Possible Duplicate:
Python k-means algorithm

I want to cluster 10000 indexed points based on their feature vectors and get their ids after clustering i.e. cluster1:[p1, p3, p100, …], cluster2:[…] …

Is there any way to do this in Python? Thx~

P.s. The indexed points are stored in a 10000*10 matrix, where each row represents a feature vector.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T15:24:59+00:00Added an answer on May 27, 2026 at 3:24 pm

    Use some clustering algorithm – I’ve included an implementation of the K-means algorithm that @Cameron linked to in his second comment, but you might want to refer to the link in his first comment. I’m not sure what you mean by get their ID’s, could you elaborate?

    from math import sqrt
    
    def k_means(data_pts, k=None):
        """ Return k (x,y) pairs where:
                k = number of clusters
            and each
                (x,y) pair = centroid of cluster
    
            data_pts should be a list of (x,y) tuples, e.g.,
                data_pts=[ (0,0), (0,5), (1,3) ]
        """
    
        """ Helper functions """
        def lists_are_same(la, lb): # see if two lists have the same elements
            out = False
            for item in la:
                if item not in lb:
                    out = False
                    break
                else:
                    out = True
            return out  
        def distance(a, b): # distance between (x,y) points a and b
            return sqrt(abs(a[0]-b[0])**2 + abs(a[1]-b[1])**2)
        def average(a): # return the average of a one-dimensional list (e.g., [1, 2, 3])
            return sum(a)/float(len(a))
    
        """ Set up some initial values """
        if k is None: # if the user didn't supply a number of means to look for, try to estimate how many there are
            n = len(data_pts)# number of points in the dataset
            k = int(sqrt(n/2))  # number of clusters - see
                            #   http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set#Rule_of_thumb
        if k < 1: # make sure there's at least one cluster
            k = 1
    
    
    
        """ Randomly generate k clusters and determine the cluster centers,
            or directly generate k random points as cluster centers. """
    
        init_clusters = data_pts[:]         # put all of the data points into clusters
        shuffle(init_clusters)          # put the data points in random order
        init_clusters = init_clusters[0:k]  # only keep the first k random clusters
    
        old_clusters, new_clusters = {}, {} 
        for item in init_clusters:
            old_clusters[item] = [] # every cluster has a list of points associated with it. Initially, it's 0
    
        while 1: # just keep going forever, until our break condition is met
            tmp = {}
            for k in old_clusters: # create an editable version of the old_clusters dictionary
                tmp[k] = []
    
            """ Associate each point with the closest cluster center. """
            for point in data_pts: # for each (x,y) data point
                min_clust = None
                min_dist = 1000000000 # absurdly large, should be larger than the maximum distance for most data sets
                for pc in tmp: # for every possible closest cluster
                    pc_dist = distance(point, pc)
                    if pc_dist < min_dist: # if this cluster is the closest, have it be the closest (duh)
                        min_dist = pc_dist
                        min_clust = pc
                tmp[min_clust].append(point) # add each point to its closest cluster's list of associated points
    
            """ Recompute the new cluster centers. """
            for k in tmp:
                associated = tmp[k]
                xs = [pt[0] for pt in associated] # build up a list of x's
                ys = [pt[1] for pt in associated] # build up a list of y's
                x = average(xs) # x coordinate of new cluster
                y = average(ys) # y coordinate of new cluster
                new_clusters[(x,y)] = associated # these are the points the center was built off of, they're *probably* still associated
    
            if lists_are_same(old_clusters.keys(), new_clusters.keys()): # if we've reached equilibrium, return the points
                return old_clusters.keys()
            else: # otherwise, we'll go another round. let old_clusters = new_clusters, and clear new_clusters.
                old_clusters = new_clusters
                new_clusters = {}
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Possible Duplicate: python limiting floats to two decimal points i want to set 39.54484700000000
Possible Duplicate: Python - '>>' operator What does the >> operator means in Python?
Possible Duplicate: Python: simple list merging based on intersections I have a multiple list:
Possible Duplicate: Unload a module in Python After importing Numpy, lets say I want
Possible Duplicate: Python - Determine the type of an object? I want 'complex' to
Possible Duplicate: Python try-else Comming from a Java background, I don't quite get what
Possible Duplicate: python limiting floats to two decimal points If I make this: 12.45-12
Possible Duplicate: Python: How do I pass a variable by reference? I want to
Possible Duplicate: Python ‘==’ vs ‘is’ comparing strings, ‘is’ fails sometimes, why? Is a
Possible Duplicate: Python ‘==’ vs ‘is’ comparing strings, ‘is’ fails sometimes, why? In Python,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.