Recoding is a common practice for survey data, but the most obvious routes take more time than they should.
The fastest code that accomplishes the same task with the provided sample data by system.time() on my machine wins.
## Sample data
dat <- cbind(rep(1:5,50000),rep(5:1,50000),rep(c(1,2,4,5,3),50000))
dat <- cbind(dat,dat,dat,dat,dat,dat,dat,dat,dat,dat,dat,dat)
dat <- as.data.frame(dat)
re.codes <- c("This","That","And","The","Other")
Code to optimize.
for(x in 1:ncol(dat)) {
dat[,x] <- factor(dat[,x], labels=re.codes)
}
Current system.time():
user system elapsed
4.40 0.10 4.49
Hint: dat <- lapply(1:ncol(dat), function(x) dat[,x] <- factor(dat[,x],labels=rc))) is not any faster.
Combining @DWin’s answer, and my answer from Most efficient list to data.frame method?: