I haven’t had much experience with machine learning or clustering, so I’m at a bit of a loss as to how to approach this problem. My data of interest consists of 4 columns, one of which is just an id. The other 3 contain numerical data, values >= 0. The clustering I need is actually quite straightforward, and I could do it by hand, but it will get less clear later on so I want to start out with the right sort of process. I need 6 clusters, which depend on the 3 columns (call them A, B and C) as follows:
A B C Cluster
---- ---- -------- -------
0 0 0 0
0 0 >0 1
0 >0 <=B 2
0 >0 >B 3
>0 any <=(A+B) 4
>0 any >(A+B) 5
At this stage, these clusters will give an insight to the data to inform further analysis.
Since I’m quite new to this, I haven’t yet learned enough about the various algorithms which do clustering, so I don’t really know where to start. Could anyone suggest an appropriate model to use, or a few that I can research.
This does not look like clustering to me.
Instead, I figure you want a simple decision tree classification.
It should already be available in Rapidminer.