I’m writing a K-Means cluster for a project which uses clusters to identify objects, so the robot is freely almost autonomous. The camera basically captures a picture in a half second
rate, which is stores in a ‘blob’ of pixels. This blob is sent to the data mining algorithm, k-means, which identify the ‘shades’ of the object as a cluster, so the robot can be programmed to avoid those areas. I post my k-means code. it’s written in python.
import sys, math, random
class Point:
def __init__(self, coords, reference=None):
self.coords = coords
self.n = len(coords)
self.reference = reference
def __repr__(self):
return str(self.coords)
class Cluster:
def __init__(self, points):
if len(points) == 0:
raise Exception("ILLEGAL: empty cluster")
self.points = points
self.n = points[0].n # make the first element to be the number of clusters
for p in points:
if p.n != self.n:
raise Exception("ILLEGAL: wrong dimension")
self.centroid = self.calculateCentroid()
def __repr__(self):
return str(self.points)
def update(self, points):
old_centroid = self.centroid
self.points = points
self.centroid = self.calculateCentroid()
return getDistance(old_centroid, self.centroid)
def calculateCentroid(self):
reduce_coord = lambda i:reduce(lambda x,p : x + p.coords[i], self.points, 0.0)
if len(self.points) == 0:
print "Dividing by 0"
self.points = [1]
centroid_coords = [reduce_coord(i) / len(self.points) for i in range(self.n)]
return Point(centroid_coords)
def kmeans(points, k, cutoff):
initial = random.sample(points, k)
clusters = [Cluster([p]) for p in initial]
print clusters
while True:
lists = [ [] for c in clusters]
for p in points:
smallest_distance = getDistance(p, clusters[0].centroid)
index = 0
for i in range(len(clusters[1:])):
distance = getDistance(p, clusters[i+1].centroid)
if distance < smallest_distance:
smallest_distance = distance
index = i+1
lists[index].append(p)
biggest_shift = 0.0
for i in range(len(clusters)):
shift = clusters[i].update(lists[i])
biggest_shift = max(biggest_shift, shift)
if biggest_shift < cutoff:
break
return clusters
def getDistance(a, b):
if a.n != b.n:
raise Exception("ILLEGAL: non comparable points")
ret = reduce(lambda x, y: x + pow((a.coords[y] - b.coords[y]), 2), range(a.n), 0.0)
return math.sqrt(ret)
def makeRandomPoint(n, lower, upper):
return Point([random.uniform(lower, upper) for i in range(n)])
def main():
num_points, dim, k, cutoff, lower, upper = 10, 2, 3, 0.5, 0, 200
points = map(lambda i: makeRandomPoint(dim, lower, upper), range(num_points))
clusters = kmeans(points, k, cutoff)
for i, c in enumerate(clusters):
for p in c.points:
print "Cluster: ", i, "\t Point: ", p
if __name__ == "__main__":
main()
Sure enough, it’s not working!
Traceback (most recent call last):
File "C:\Users\philippe\Documents\workspace-sts-2.7.2.RELEASE\scribber\kmeans\kmeans.py", line 100, in ?
main()
File "C:\Users\philippe\Documents\workspace-sts-2.7.2.RELEASE\scribber\kmeans\kmeans.py", line 92, in main
[ clusters = kmeans(points, k, cutoff)
[[89.152748179548524, 81.217634455465131]], [[83.439023369838509, 169.75355953688432]], [[1.8622622156419633, 41.364078271733739]]]
Dividing by 0
File "C:\Users\philippe\Documents\workspace-sts-2.7.2.RELEASE\scribber\kmeans\kmeans.py", line 69, in kmeans
shift = clusters[i].update(lists[i])
File "C:\Users\philippe\Documents\workspace-sts-2.7.2.RELEASE\scribber\kmeans\kmeans.py", line 35, in update
self.centroid = self.calculateCentroid()
File "C:\Users\philippe\Documents\workspace-sts-2.7.2.RELEASE\scribber\kmeans\kmeans.py", line 43, in calculateCentroid
centroid_coords = [reduce_coord(i) / len(self.points) for i in range(self.n)]
File "C:\Users\philippe\Documents\workspace-sts-2.7.2.RELEASE\scribber\kmeans\kmeans.py", line 39, in <lambda>
reduce_coord = lambda i:reduce(lambda x,p : x + p.coords[i], self.points, 0.0)
File "C:\Users\philippe\Documents\workspace-sts-2.7.2.RELEASE\scribber\kmeans\kmeans.py", line 39, in <lambda>
reduce_coord = lambda i:reduce(lambda x,p : x + p.coords[i], self.points, 0.0)
AttributeError: 'int' object has no attribute 'coords'
When I do a print in lists in the function kmeans(points, k, cutoff) I get
[[], [], []]. I’m trying to figure it out, why is that returning me an empty list. I posted the entire code, so one can run the code and replicate the error. In the error log, it’s possible to see what ‘clusters` is: that list of points.
Thanks
The problem is that if the list of points closest to a given cluster is empty (all the points are closer to a different cluster), then you’ll get a divide by 0 error, at which point you assign garbage data to self.points which causes the eventual error you are seeing.
This is possible if two clusters have the same centroid, in which case the second cluster will never have points assigned to it.
Incidentally, there’s another bug. You have an extra indent in front of
lists[index].append(p)
You should consider rewriting that whole loop using enumerate and min to make it cleaner anyway.
Here’s how I’d suggest rewriting things.