I have a question about k-means clustering in R. Actually i’m doing everything according

Question

0

Asked: May 26, 20262026-05-26T18:33:51+00:00 2026-05-26T18:33:51+00:00

I have a question about k-means clustering in R. Actually i’m doing everything according

0

I have a question about k-means clustering in R. Actually i’m doing everything according to this article. Everything is based on examples within the tm package so it’s required no data import. acq contains 50 documents and crude 20 documents.

library(tm)
data("acq")
data("crude")
ws <- c(acq, crude)
wsTDM <- Data(TermDocumentMatrix(ws)) #First problem here
wsKMeans <- kmeans(wsTDM, 2)
wsReutersCluster <- c(rep("acq", 50), rep("crude", 20))
cl_agreement(wsKMeans, as.cl_partition(wsReutersCluster), "diag")

Error in lapply(X, FUN, ...) : 
(list) object cannot be coerced to type 'integer'

I actually want to create cross agreement matrix. But this article was wrote in 2008 since then a lot have changed. The Data function is only available in RSurvey package, but i’m kinda doubt is it the same. And i think that the main problem is that TermDocumentMatrix was S4 class and now it’s S3. I know it’s possibly to do this having text only. But I wanna do it like this since in TDM it’s possible to remove stopwords, punct, etc for better results. So if someone has any solution that would be terrific.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T18:33:51+00:00

The TDM is stored as a sparse matrix, as described in ?TermDocumentMatrix. This can also be seen from just inspecting the object like str(wsTDM). That old Data() function was just a way to access the contents as a regular matrix. It is not needed anymore. Just do kmeans(wsTDM, 2) and you’ll see that the output is as expected, with clusters identified for 2775 observations (terms) on 70 features (documents). Good luck!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a question about k-means clustering in R. Actually i’m doing everything according

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply