I want to do the same as explained here, i.e. adding missing rows to a data.table. The only additional difficulty I’m facing is that I want the number of key columns, i.e. those rows that are used for the self-join, to be flexible.
Here is a small example that basically repeats what is done in the link mentioned above:
df <- data.frame(fundID = rep(letters[1:4], each=6),
cfType = rep(c("D", "D", "T", "T", "R", "R"), times=4),
variable = rep(c(1,3), times=12),
value = 1:24)
DT <- as.data.table(df)
idCols <- c("fundID", "cfType")
setkeyv(DT, c(idCols, "variable"))
DT[CJ(unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))), nomatch=NA]
What bothers me is the last line. I want idCols to be flexible (for instance if I use it within a function), so I don’t want to type unique(df$fundID), unique(df$cfType) manually. However, I just don’t find any workaround for this. All my attempts to automatically split the subset of df into vectors, as needed by CJ, fail with the error message Error in setkeyv(x, cols, verbose = verbose) : Column ‘V1’ is type ‘list’ which is not (currently) allowed as a key column type.
CJ(sapply(df[, idCols], unique))
CJ(unique(df[, idCols]))
CJ(as.vector(unique(df[, idCols])))
CJ(unique(DT[, idCols, with=FALSE]))
I also tried building the expression myself:
str <- ""
for (i in idCols) {
str <- paste0(str, "unique(df$", i, "), ")
}
str <- paste0(str, "seq(from=min(variable), to=max(variable))")
str
[1] "unique(df$fundID), unique(df$cfType), seq(from=min(variable), to=max(variable))"
But then I don’t know how to use str. This all fails:
CJ(eval(str))
CJ(substitute(str))
CJ(call(str))
Does anyone know a good workaround?
I’ve never used the data.table package, so forgive me if I miss the mark here, but I think I’ve got it. There’s a lot going on here. Start by reading up on
do.call, which allows you to evaluate any function in a sort of non-traditional manner where arguments are specified by a supplied list (where each element is in the list is positionally matched to the function arguments unless explicitly named). Also notice that I had to specifymin(df$variable)instead of justmin(variable). Read Hadley’s page on scoping to get an idea of the issue here.