Ok this is going to sound really confusing but I will try my best

Question

0

Asked: June 7, 20262026-06-07T20:24:26+00:00 2026-06-07T20:24:26+00:00

Ok this is going to sound really confusing but I will try my best

0

Ok this is going to sound really confusing but I will try my best to make it clear enough. I have a full dataset called fulldata this dataset is 494021x6.

I use svds (singular value decomposition) on it like so:

%% dimensionality reduction 
columns = 6
[U,S,V]=svds(fulldata,columns);

I then randomly select 1000 rows from the fulldata:

%% randomly select dataset
rows = 1000;
columns = 6;

%# pick random rows
indX = randperm( size(fulldata,1) );
indX = indX(1:rows)';

%# pick columns in a set order (2,4,5,3,6,1)
indY = indY(1:columns);

%# filter data
data = U(indX,indY);

I then apply normalization to this randomly selected 1000 rows:

% apply normalization method to every cell
maxData = max(max(data));
minData = min(min(data));
data = ((data-minData)./(maxData));

I then output a datasample from the original fulldata set which matches the 1000 selected rows:

% output matching data
dataSample = fulldata(indX, :)

Also note that when I picked “random rows” I also output the indX rows which match the rows in the fulldata.

So datasample looks like this:

enter image description here

Which is the 1000 random rows which match the original fulldata.

And indX looks like this:

enter image description here

Which is the corresponding row number from fulldata.

The problem im arriving at is when I use K-Means to cluster the 1000 random rows and I output the data of each cluster like so:

%% generate sample data
K = 6;
numObservarations = size(data, 1);
dimensions = 3;

%% cluster
opts = statset('MaxIter', 100, 'Display', 'iter');
[clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ...
'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3);

%% plot data+clusters
figure, hold on
scatter3(data(:,1),data(:,2),data(:,3), 5, clustIDX, 'filled')
scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 100, (1:K)', 'filled')
hold off, xlabel('x'), ylabel('y'), zlabel('z')
grid on
view([90 0]);

%% plot clusters quality
figure
[silh,h] = silhouette(data, clustIDX);
avrgScore = mean(silh);

% output the contents of each cluster
K1 = data(clustIDX==1,:)
K2 = data(clustIDX==2,:)
K3 = data(clustIDX==3,:)
K4 = data(clustIDX==4,:)
K5 = data(clustIDX==5,:)
K6 = data(clustIDX==6,:)

How can I match K1, k2… K6 to the corresponding indX row number? For instance K1’s output looks like this:

enter image description here

I was hoping to have extra files like K1-indX which is just a list of corresponding row numbers from indX which match the cluster data from K1, K2… etc. Or possibly append the indX row number into the K1, K2 output in column 7 (preferable)

For instance:

K1 cluster data | Belongs to fulldata row number
0.4 0.5 0.6 0.4 | 456456 etc

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T20:24:27+00:00

An example to illustrate:

%# lets use an example data of size 150x4
load fisheriris
fulldata = meas;

%# pick 100 rows at random
rIdx = randperm(size(fulldata,1));
rIdx = rIdx(1:100)';                  %#'
data = fulldata(rIdx,:);

%# cluster the subset data
K = 3;
clustIDX = kmeans(data, K);

%# divide the data according to which cluster instances were assigned to
groupedIdx = cell(K,1);
groupedData = cell(K,1);
for i=1:K
    %# instances
    groupedData{i} = data(clustIDX==i,:);

    %# corresponding row indices into the original fulldata
    groupedIdx{i} = rIdx(clustIDX==i);
end

%# check: these two should be equal
groupedData{1}(1,:)
fulldata(groupedIdx{1}(1),:)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Ok this is going to sound really confusing but I will try my best

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply