Using a data.table, which would be the fastest way to sweep out a statistic

Question

0

Editorial Team

Asked: June 1, 20262026-06-01T17:51:08+00:00 2026-06-01T17:51:08+00:00

Using a data.table, which would be the fastest way to sweep out a statistic

0

Using a data.table, which would be the fastest way to “sweep” out a statistic across a selection of columns?

Starting with (considerably larger versions of ) DT

p <- 3
DT <- data.table(id=c("A","B","C"),x1=c(10,20,30),x2=c(20,30,10))
DT.totals <- DT[, list(id,total = x1+x2) ]

I’d like to get to the following data.table result by indexing the target columns (2:p) in order to skip the key:

    id  x1  x2
[1,]    A   0.33    0.67
[2,]    B   0.40    0.60
[3,]    C   0.75    0.25

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T17:51:10+00:00

I believe that something close to the following (which uses the relatively new set() function) will be quickest:

DT <- data.table(id = c("A","B","C"), x1 = c(10,20,30), x2 = c(20,30,10))
total <- DT[ , x1 + x2]

rr <- seq_len(nrow(DT))
for(j in 2:3) set(DT, rr, j, DT[[j]]/total) 
DT
#      id        x1        x2
# [1,]  A 0.3333333 0.6666667
# [2,]  B 0.4000000 0.6000000
# [3,]  C 0.7500000 0.2500000

FWIW, calls to set() takes the following form:

# set(x, i, j, value), where: 
#     x is a data.table 
#     i contains row indices
#     j contains column indices 
#     value is the value to be assigned into the specified cells

My suspicion about the relative speed of this, compared to other solutions, is based on this passage from data.table’s NEWS file, in the section on changes in Version 1.8.0:

o   New function set(DT,i,j,value) allows fast assignment to elements
    of DT. Similar to := but avoids the overhead of [.data.table, so is
    much faster inside a loop. Less flexible than :=, but as flexible
    as matrix subassignment. Similar in spirit to setnames(), setcolorder(),
    setkey() and setattr(); i.e., assigns by reference with no copy at all.

        M = matrix(1,nrow=100000,ncol=100)
        DF = as.data.frame(M)
        DT = as.data.table(M)
        system.time(for (i in 1:1000) DF[i,1L] <- i)   # 591.000s
        system.time(for (i in 1:1000) DT[i,V1:=i])     #   1.158s
        system.time(for (i in 1:1000) M[i,1L] <- i)    #   0.016s
        system.time(for (i in 1:1000) set(DT,i,1L,i))  #   0.027s

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Using a data.table, which would be the fastest way to sweep out a statistic

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply