EDIT: This was an issue with objects in the workspace conflicting and causing unexpected

Question

0

Asked: May 29, 20262026-05-29T18:53:57+00:00 2026-05-29T18:53:57+00:00

EDIT: This was an issue with objects in the workspace conflicting and causing unexpected

0

EDIT: This was an issue with objects in the workspace conflicting and causing unexpected behavior.

I am trying to create a DocumentTermMatrix from a document using the following code. The document contains many 1 and 2-character tokens. However, even when the minimum word length is set to 1 character, the resulting matrix contains 699 documents and 0 terms.

library(tm)
data <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data",header=FALSE)
data <- data[-1]

training_data <- as.vector(apply(as.matrix(data, mode="character"),1,paste,collapse=" "))
corpus <- Corpus(VectorSource(training_data))

matrix <- DocumentTermMatrix(corpus,control=list(wordLengths=c(1,Inf)))

Can anyone shed some light as to why no tokens are created despite there being many 1 and 2 character tokens in the data? Here is one sample data entry:

" 4  8  8  5  4 5 10  4  1 4"

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T18:54:01+00:00

I ran exactly what you gave me in the latest version of R and tm on a windows 7 machine and produced the results you were looking for(see below). I’d try clearing your workspace, exiting R and/or rebooting.

> library(tm)
> data <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data",header=FALSE)
> data <- data[-1]
> 
> training_data <- as.vector(apply(as.matrix(data, mode="character"),1,paste,collapse=" "))
> corpus <- Corpus(VectorSource(training_data))
> 
> matrix <- DocumentTermMatrix(corpus,control=list(wordLengths=c(1,Inf)))
> matrix
A document-term matrix (699 documents, 11 terms)

Non-/sparse entries: 2899/4790
Sparsity           : 62%
Maximal term length: 2 
Weighting          : term frequency (tf)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

EDIT: This was an issue with objects in the workspace conflicting and causing unexpected

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply