I have a clustering problem that could be summarized this way: i have N

Question

0

Editorial Team

Asked: June 3, 20262026-06-03T15:51:50+00:00 2026-06-03T15:51:50+00:00

I have a clustering problem that could be summarized this way: i have N

0

I have a clustering problem that could be summarized this way:

i have N particles in a 3D spaces
each particle can interact with a different number of other particles
each interaction has a strength
i don’t know the number of cluster a priori
i don’t have leaning samples (should be unsupervised)

Output: i’d like to get:

the number of clusters
a probability for each particle to be part of a cluster (to be able to remove particles not clearly assigned)
i want to call the clusterer directly from my java code.

Question:

what clusterer would fit best to my problem?
how should i format my data?
should i use the 3D positioning information in complement to the interaction information?
how can i get the result for each particle?

I’m very new to weka, but from what i could find on the Internet:

SOM could solve my problem
it is a multi-instance problem but i could find any examples showing how to create relational data. and does SOM support relational attributes?

Thanks for your help.
jeannot

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T15:51:51+00:00

Weka is very “limited” when it comes to clustering. It has only very few clustering algorithms, and they are quite limited. I’m not sure if you could put in the interaction strength into any of the Weka clustering algorithms.

You might want to have a look at ELKI. It has much more advanced clustering algorithms than Weka, and they are very flexible. For example, you can easily define your own distance function (Tutorial) and use it in any distance-based clustering algorithm.

Choosing the appropriate clustering algorithm is nothing we can answer here. You need to try some and try different parameters. The key question you should try to answer first is: what is a useful cluster for you?

You have started to pose some of these questions. For example, whether you want to use interaction strength only, or whether to also include positional information. But as I do not know what you want to achieve, I can’t tell you how.

Definitely have a look at the DBSCAN and OPTICS algorithms (in particular for OPTICS, don’t use the one in Weka. It is slow, incomplete and unmaintained!). Maybe start reading their Wikipedia articles, if that makes any sense for your task. Here is why I believe they are helpful for you:

They do not need to know the number of clusters (unlike k-means and EM clustering)
They need a “minimum points” parameter, which is essentially a “minimum cluster size”; it controls how fine-grained the result becomes. Increase it to get fewer and larger clusters.
They can use arbitrary distance or similarity functions (for example, interaction strength). For DBSCAN you need to set a threshold to consider significant, for OPTICS this is not necessary.

Next I would probably use the interaction-strength data with OPTICS and try the Xi-extraction of clusters, if they make any sense for your use case. (Weka doesn’t have the Xi extraction). Or maybe look at the OPTICS plot first, to see if your similarity and MinPts parameter actually produce the “valleys” you need for OPTICS.
DBSCAN is faster, but you need to fix the distance threshold. If your data set is very large, you might want to start with OPTICS on a sample, then decide on a few epsilon-values and run DBSCAN on the full dataset with these values.

Still, start reading here to see if that makes sense for your task:

https://en.wikipedia.org/wiki/DBSCAN#Basic_idea

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a clustering problem that could be summarized this way: i have N

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply