I’d like to add a new column to my data.table, which contains data from

Question

0

Asked: June 2, 20262026-06-02T03:04:47+00:00 2026-06-02T03:04:47+00:00

I’d like to add a new column to my data.table, which contains data from

0

I’d like to add a new column to my data.table, which contains data from one of the other columns. The choice of column, however, varies per row – depending on the contents of another column. So:

for the data set:

     a_data b_data column_choice
[1,]     55      1             a
[2,]     56      2             a
[3,]     57      3             b

generated by:

dat=data.table(a_data = c(55, 56, 57), 
               b_data = c(1,  2,  3), 
               column_choice = c("a", "a", "b"))

I’d like a new column, ‘chosen’, which contains (per row) either the data from “a_data” or “b_data”, depending on the value of “column_choice”. The resulting data table will therefore be:

     a_data b_data column_choice chosen
[1,]     55      1             a     55
[2,]     56      2             a     56
[3,]     57      3             b      3

I have managed to get the desired effect using:

dat=dat[, data.table(.SD, chosen=.SD[[paste0(.SD$column_choice, "_data")]]),
        by=1:nrow(a)]
dat$nrow = NULL

however this feels quite clunky; perhaps there’s a simpler way to do it (that will no doubt also teach me something about R)?

In practice, the data frame also has lots of other columns that need to be preserved, more choices than just ‘a or b’, and several of these types of column to generate, so I’d rather not use the basic ifelse solution that may be appropriate for the basic example above.

Thank you very much for your help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T03:04:48+00:00

I think I’ve now found a properly vectorised one liner, that’s also faster than the other answers in this case.

petesFun2 uses data.table aggregation as petesFun, however now vectorised across column_choice (rather than per item, as previously).

While petesFun2 is fine for my purposes, it does leave both the rows and columns in a different order. In the interests of comparison with the other answers, therefore, I’ve added petesFun2Clean which maintains the same ordering as the other answers.

petesFun2 <-function(myDat) {
  return(myDat[, cbind(.SD, chosen=.SD[[paste0(.BY$column_choice, "_data")]]),
               by=column_choice])
}

petesFun2Clean <-function(myDat) {
  myDat = copy(myDat) # To prevent reference issues
  myDat[, id := seq_len(nrow(myDat))] # Assign an id
  result = myDat[, cbind(.SD, chosen=.SD[[.BY$choice]]),
                 by=list(column_choice, choice=paste0(column_choice, "_data"))]

  # recover ordering and column order.
  return(result[order(id), 
                list(a_data, b_data, c_data, column_choice, chosen)]) 
}

benchmark(benRes<-   myFun(test.dat),
          petesRes<- petesFun(test.dat),
          dowleRes<- dowleFun(test.dat),
          petesRes2<-petesFun2(test.dat),
          petesRes2Clean<- petesFun2Clean(test.dat),
          replications=25,
          columns=c("test", "replications", "elapsed", "relative"))

#                                         test replications elapsed  relative
# 1                  benRes <- myFun(test.dat)           25   0.337  4.160494
# 3             dowleRes <- dowleFun(test.dat)           25   0.191  2.358025
# 5 petesRes2Clean <- petesFun2Clean(test.dat)           25   0.122  1.506173
# 4           petesRes2 <- petesFun2(test.dat)           25   0.081  1.000000
# 2             petesRes <- petesFun(test.dat)           25   4.018 49.604938

identical(petesRes2, benRes)
# FALSE (due to row and column ordering)
identical(petesRes2Clean, benRes)
# TRUE

EDIT: I just noticed (as mentioned by Matthew in the comments) that we now have by group :=. So we can drop the cbind and simply do:

myDat[, chosen := .SD[[paste0(.BY$column_choice, “_data”)]],
by=column_choice]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’d like to add a new column to my data.table, which contains data from

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply