I want to split this data, ID x y 1 2.5 3.5 1 85.1

Question

0

Asked: June 17, 20262026-06-17T18:32:37+00:00 2026-06-17T18:32:37+00:00

I want to split this data, ID x y 1 2.5 3.5 1 85.1

0

I want to split this data,

ID x    y
1  2.5  3.5
1  85.1 74.1
2  2.6  3.4
2  86.0 69.8
3  25.8 32.9
3  84.4 68.2
4  2.8  3.2
4  24.1 31.8
4  83.2 67.4

I was able, making match with their partner like,

ID x    y    ID x    y   
1  2.5  3.5  1  85.1 74.1
2  2.6  3.4  2  86.0 69.8
             3  25.8 32.9
             4  24.1 31.8

However, as you notice some of the new row in ID 4 were placed wrong, because it just got added in the next few rows. I want to split them properly without having to use complex logic which I am already using… Someone can give me an algorithm or idea?

it should looks like,

ID x    y    ID x    y      ID x    y 
1  2.5  3.5  1  85.1 74.1   3  25.8 32.9
2  2.6  3.4  2  86.0 69.8   4  24.1 31.8
4  2.8  3.2  3  84.4 68.2
             4  83.2 67.4

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T18:32:38+00:00

It seems that your question is really about clustering, and that the ID column has nothing to do with the determining which points correspond to which.

A common algorithm to achieve that would be k-means clustering. However, your question implies that you don’t know the number of clusters in advance. This complicates matters, and there have been already a lot of questions asked here on StackOverflow regarding this issue:

Unfortunately, there is no “right” solution for this. Two clusters in one specific problem could be indeed considered as one cluster in another problem. This is why you’ll have to decide that for yourself.

Nevertheless, if you’re looking for something simple (and probably inaccurate), you can use Euclidean distance as a measure. Compute the distances between points (e.g. using pdist), and group points where the distance falls below a certain threshold.

Example

%// Sample input
A = [1,  2.5,  3.5;
     1,  85.1, 74.1;
     2,  2.6,  3.4;
     2,  86.0, 69.8;
     3,  25.8, 32.9;
     3,  84.4, 68.2;
     4,  2.8,  3.2;
     4,  24.1, 31.8;
     4,  83.2, 67.4];

%// Cluster points
pairs = nchoosek(1:size(A, 1), 2); %// Rows of pairs
d = sqrt(sum((A(pairs(:, 1), :) - A(pairs(:, 2), :)) .^ 2, 2)); %// d = pdist(A)
thr = d < 10;                      %// Distances below threshold
kk = 1;
idx = 1:size(A, 1);
C = cell(size(idx));               %// Preallocate memory
while any(idx)
     x = unique(pairs(pairs(:, 1) == find(idx, 1) & thr, :));
     C{kk} = A(x, :);
     idx(x) = 0;                   %// Remove indices from list
     kk = kk + 1;
end
C = C(~cellfun(@isempty, C));      %// Remove empty cells

The result is a cell array C, each cell representing a cluster:

C{1} =
    1.0000    2.5000    3.5000
    2.0000    2.6000    3.4000
    4.0000    2.8000    3.2000

C{2} =
    1.0000   85.1000   74.1000
    2.0000   86.0000   69.8000
    3.0000   84.4000   68.2000
    4.0000   83.2000   67.4000

C{3} = 
    3.0000   25.8000   32.9000
    4.0000   24.1000   31.8000

Note that this simple approach has the flaw of restricting the cluster radius to the threshold. However, you wanted a simple solution, so bear in mind that it gets complicated as you add more “clustering logic” to the algorithm.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to split this data, ID x y 1 2.5 3.5 1 85.1

Leave an answerCancel reply

1 Answer

Example

Leave an answer
Cancel reply