I have been using data.table for some computation and am wondering what are the possible return types for the j parameter so that it stacks up my output correctly? I know data.frame is acceptable so list must be as well? My function returns multiple rows and multiple columns for each id. So imagine:
dtb <- data.table(id=rep(1:5,20), a=1:100, b=sample(1:100, 100), c=sample(1:100, 100))
f <- function(dt) { return(c(dt$a+1, dt$b+1, dt$c+1))}
dtb[,f(.SD), by=id]
This clearly does not work properly. This does:
dtb <- data.table(id=rep(1:5,20), a=1:100, b=sample(1:100, 100), c=sample(1:100, 100))
f <- function(dt) { return(data.frame(a=dt$a+1, b=dt$b+1, c=dt$c+1))}
dtb[,f(.SD), by=id]
Constructing these data.frames seems like a really inefficient way to do things. What are some suggestions? The by must be used.
When you wrote this
c(dt$a+1, dt$b+1, dt$c+1)you should have expected a single vector (plus the group id column. Try this instead:EDIT2 (there was an error in my earlier edit that I only noticed when posting the full code). To the question about “cheaper”: Here’s a benchmark run that shows list construction to be ‘cheaper’: