While most questions are about grouping nodes based on similarity (pidgeonholes), I would like

Question

0

Asked: June 1, 20262026-06-01T12:38:54+00:00 2026-06-01T12:38:54+00:00

While most questions are about grouping nodes based on similarity (pidgeonholes), I would like

0

While most questions are about grouping nodes based on similarity (pidgeonholes), I would like to group nodes based on simply their proximity.

I have a large, dense collection of nodes- Potentially millions. On-screen they take up some amount of space, so they can be thought of as having a size.

What I am trying to do is to group these nodes into single containing nodes efficiently, both in processing time and also in collecting more nodes per container.

My current attempts have either been too slow, or didn’t work, but are all based off of the same solution I have in mind: Calculate a lot of possible containers by taking a node and it’s surrounding nodes at random and grouping them, then picking the most effective container.

What are your ideas, not specifically in any language, but I will be using PHP or JavaScript for this.

Edit

I forgot to mention that the nodes will be streamed in, so it needs to accept unlimited nodes, putting them into containers as they come, creating new containers or even deleting them as necessary, for up to millions of containers. That would be the most ideal.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T12:38:56+00:00

This problem is called clustering. You have a set of nodes and a function m that calculates the distance between any two nodes. You now search for clusters so that the sum of all the distances between all nodes inside each cluster is minimal.

There are some easy algorithms to do this. Search for k-Means and k-Medoid for example. These two are very similar to your approach. A more efficient version is the CLARANS algorithm [NH94]. I didn’t find any good sources for you but here you go:

(German) Script on clustering in general. Contains CLARANS in pseudo-code on page 45
http://www.informatik.hu-berlin.de/forschung/gebiete/wbi/teaching/archive/ws1112/vl_datawarehousing/15_clustering_12.pdf

English script that explains CLARANS
http://bib.dbvis.de/uploadedFiles/232.pdf

Paper about CLARANS
http://www.comp.nus.edu.sg/~atung/publication/pakdd002.pdf

The “k” in the names is the number of clusters. For those 3 algorithms you have to specify the number of clusters a priori.

For a different approach, see the DBSCAN algorithm. You won’t need the number of clusters for this algorithm, but you have to provide some other knowledge of your nodes. The wikipedia article explains this very well. 🙂

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

While most questions are about grouping nodes based on similarity (pidgeonholes), I would like

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply