I have hit a real problem. I need to do some Kmeans clustering for

Question

0

Asked: May 16, 20262026-05-16T10:56:10+00:00 2026-05-16T10:56:10+00:00

I have hit a real problem. I need to do some Kmeans clustering for

0

I have hit a real problem. I need to do some Kmeans clustering for 5 million vectors, each containing about 32 cols.
I tried out Mahout which requires linux and I am on windows, I am restrained from using a Linux OS and any sort of simulator.

Can anyone suggest a KMeans clustering algorithm that is scalable upto 5M vectors and can converge quickly?

I have tested a few but they wont scale. Which means they are slow and take forever to complete.

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T10:56:11+00:00

OK, So who ever wants clustering for large scale datasets, the only way of doing so is to use Mahout. IT requires a linux platform. So I had to use virtual box, placed Ubuntu on it and then used Mahout. Its a lengthy procedure to set up Mahout, but the two links that I used are as follows.

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have hit a real problem. I need to do some Kmeans clustering for

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply