I am creating a Copus from a dataframe. I pass it as a VectorSource as there is only one column I want to be used as the text source. This works find however I need the document ids within the corpus to match the document ids from the dataframe. The document ids are stored in a separate column in the original dataframe.
df <- as.data.frame(t(rbind(c(1,3,5,7,8,10),
c("text", "lots of text", "too much text", "where will it end", "give peas a chance","help"))))
colnames(df) <- c("ids","textColumn")
library("tm")
library("lsa")
corpus <- Corpus(VectorSource(df[["textColumn"]]))
Running this code creates a corpus however the document ids run from 1-6. Is there any way of creating the corpus with the document ids 1,3,5,7,8,10?
Well, one simple but not very elegant way to assign your ids to your documents afterward could be the following :