I’d like to add a new column to my data.table, which contains data from one of the other columns. The choice of column, however, varies per row – depending on the contents of another column. So:
for the data set:
a_data b_data column_choice
[1,] 55 1 a
[2,] 56 2 a
[3,] 57 3 b
generated by:
dat=data.table(a_data = c(55, 56, 57),
b_data = c(1, 2, 3),
column_choice = c("a", "a", "b"))
I’d like a new column, ‘chosen’, which contains (per row) either the data from “a_data” or “b_data”, depending on the value of “column_choice”. The resulting data table will therefore be:
a_data b_data column_choice chosen
[1,] 55 1 a 55
[2,] 56 2 a 56
[3,] 57 3 b 3
I have managed to get the desired effect using:
dat=dat[, data.table(.SD, chosen=.SD[[paste0(.SD$column_choice, "_data")]]),
by=1:nrow(a)]
dat$nrow = NULL
however this feels quite clunky; perhaps there’s a simpler way to do it (that will no doubt also teach me something about R)?
In practice, the data frame also has lots of other columns that need to be preserved, more choices than just ‘a or b’, and several of these types of column to generate, so I’d rather not use the basic ifelse solution that may be appropriate for the basic example above.
Thank you very much for your help.
I think I’ve now found a properly vectorised one liner, that’s also faster than the other answers in this case.
petesFun2 uses data.table aggregation as petesFun, however now vectorised across column_choice (rather than per item, as previously).
While petesFun2 is fine for my purposes, it does leave both the rows and columns in a different order. In the interests of comparison with the other answers, therefore, I’ve added petesFun2Clean which maintains the same ordering as the other answers.
EDIT: I just noticed (as mentioned by Matthew in the comments) that we now have by group :=. So we can drop the cbind and simply do:
myDat[, chosen := .SD[[paste0(.BY$column_choice, “_data”)]],
by=column_choice]