I am writing program to implement k-means clustering.
consider a simple input with 4 vertices a,b,c and d with following edge costs
[vertex1] [vertex2] [edge cost]
a b 1
a c 2
a d 3
b d 4
c d 5
Now I need to make the program run until i get 2 clusters.
My doubt is, in the first step when calculate the minimum distance it is a->b (edge cost 1). Now I should consider ab as a single cluster. If that is the case, what will be the distance of ab from c and d?
The K-means algorithm works as follows:
2and stop when, in step3, no vertex get assigned to another centroid — or until your error condition gets satisfied.In your case, as you have an undirected graph, it’d be better for you to generate the coordinates of each vertex considering the edge distances, and then, apply the algorithm.
If you don’t want to do this initial process, you may calculate the distance from a vertex to all other reachable vertices, but you’d have to do this for every iteration — which is quite an unnecessary overhead.
For your undirected graph:
The table of distances would be something like:
If this should be your table, simply apply the Dijkstra algorithm on your graph, for each vertex, and consider the resultant table your table of distances.
The table would have the minimal distances, but, if you have any other policy to calculate it, it’s totally up to you saying how to calculate it.
Notice also that, if your graph is directed, the matrix will not be symmetric, as it is, in this case.