Using a data.table, which would be the fastest way to “sweep” out a statistic across a selection of columns?
Starting with (considerably larger versions of ) DT
p <- 3
DT <- data.table(id=c("A","B","C"),x1=c(10,20,30),x2=c(20,30,10))
DT.totals <- DT[, list(id,total = x1+x2) ]
I’d like to get to the following data.table result by indexing the target columns (2:p) in order to skip the key:
id x1 x2
[1,] A 0.33 0.67
[2,] B 0.40 0.60
[3,] C 0.75 0.25
I believe that something close to the following (which uses the relatively new
set()function) will be quickest:FWIW, calls to
set()takes the following form:My suspicion about the relative speed of this, compared to other solutions, is based on this passage from data.table’s NEWS file, in the section on changes in Version 1.8.0: