I’m currently trying to get used to data.table package in R. I want to get the index for the last 1 ocurring in each row of a data table, say a and add that new column to a. My code for this is the following:
a = data.table(matrix(sample(c(0,1),500,rep=T),50,10))
a[,ind:=apply(a==1,1,function(x) max(which(x)))]
Nevertheless, I think this can be written in a short way using more data.table syntax. Therefore, my question is: how to do this without the apply function within the j component of [?
Great question. Yes, the
applyby row isn’t page efficient, thewhichwill be allocating for each and every row, and thea==1creates a new logical matrix as large asa.In
data.tablewe do things by column. Sometimes, it’sdata.table-ish to use aforloop through columns (neverforloop through rows) :As you can see it’s a completely different style. But, I think, this should be :
awhich()(a non primitive, vectorized function)10rather thannrow(a)timesI didn’t do any speed tests, though, so I might have to eat my words.
See
?set.In response to comment, to inspect how it works,
sethappens to return a pointer to thedata.table, so we can look at the first few rows as it progresses.Now hopefully the following reveals how it’s working :