I’m trying to cluster some data with python and scipy but the following code

Question

0

Asked: June 4, 20262026-06-04T00:34:29+00:00 2026-06-04T00:34:29+00:00

I’m trying to cluster some data with python and scipy but the following code

0

I’m trying to cluster some data with python and scipy but the following code does not work for reason I do not understand:

from scipy.sparse import *
matrix = dok_matrix((en,en), int)

for pub in pubs:
    authors = pub.split(";")
    for auth1 in authors:
        for auth2 in authors:
            if auth1 == auth2: continue
            id1 = e2id[auth1]
            id2 = e2id[auth2]
            matrix[id1, id2] += 1

from scipy.cluster.vq import vq, kmeans2, whiten
result = kmeans2(matrix, 30)
print result

It says:

Traceback (most recent call last):
  File "cluster.py", line 40, in <module>
    result = kmeans2(matrix, 30)
  File "/usr/lib/python2.7/dist-packages/scipy/cluster/vq.py", line 683, in kmeans2
    clusters = init(data, k)
  File "/usr/lib/python2.7/dist-packages/scipy/cluster/vq.py", line 576, in _krandinit
    return init_rankn(data)
  File "/usr/lib/python2.7/dist-packages/scipy/cluster/vq.py", line 563, in init_rankn
    mu  = np.mean(data, 0)
  File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 2374, in mean
    return mean(axis, dtype, out)
TypeError: mean() takes at most 2 arguments (4 given)

When I’m using kmenas instead of kmenas2 I have the following error:

Traceback (most recent call last):
  File "cluster.py", line 40, in <module>
    result = kmeans(matrix, 30)
  File "/usr/lib/python2.7/dist-packages/scipy/cluster/vq.py", line 507, in kmeans
    guess = take(obs, randint(0, No, k), 0)
  File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 103, in take
    return take(indices, axis, out, mode)
TypeError: take() takes at most 3 arguments (5 given)

I think I have the problems because I’m using sparse matrices but my matrices are too big to fit the memory otherwise. Is there a way to use standard clustering algorithms from scipy with sparse matrices? Or I have to re-implement them myself?

I created a new version of my code to work with vector space

el = len(experts)
pl = len(pubs)
print el, pl

from scipy.sparse import *
P = dok_matrix((pl, el), int)

p_id = 0
for pub in pubs:
    authors = pub.split(";")
    for auth1 in authors:
        if len(auth1) < 2: continue
        id1 = e2id[auth1]
        P[p_id, id1] = 1

from scipy.cluster.vq import kmeans, kmeans2, whiten
result = kmeans2(P, 30)
print result

But I’m still getting the error:

TypeError: mean() takes at most 2 arguments (4 given)

What am I doing wrong?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T00:34:31+00:00

Editorial Team

2026-06-04T00:34:31+00:00Added an answer on June 4, 2026 at 12:34 am

K-means cannot be run on distance matrixes.

It needs a vector space to compute means in, that is why it is called k-means. If you want to use a distance matrix, you need to look into purely distance based algorithms such as DBSCAN and OPTICS (both on Wikipedia).

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to cluster some data with python and scipy but the following code

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply