Using the data.table package in R, I am trying to create a cartesian product of two data.tables using the merge method as one would do in base R.
In base the following works:
#assume this order data
orders <- data.frame(date = as.POSIXct(c('2012-08-28','2012-08-29','2012-09-01')),
first.name = as.character(c('John','George','Henry')),
last.name = as.character(c('Doe','Smith','Smith')),
qty = c(10,50,6))
#and these dates
dates <- data.frame(date = seq(from = as.POSIXct('2012-08-28'),
to = as.POSIXct('2012-09-07'), by = 'day'))
#get the unique customers
cust<-unique(orders[,c('first.name','last.name')])
#using merge from base R, get the cartesian product
merge(dates, cust, by = integer(0))
However, the same technique does not work using data.table and this error is thrown:
"Error in merge.data.table(dates.dt, cust.dt, by = integer(0)) : A non-empty vector of column names for `by` is required."
#data.table approach
library(data.table)
orders.dt <- data.table(orders)
dates.dt <- data.table(dates)
cust.dt <- unique(orders.dt[, list(first.name, last.name)])
#try to use merge (data.table) in the same manner as base
merge(dates.dt, cust.dt, by = integer(0))
Error in merge.data.table(dates.dt, cust.dt, by = integer(0)) : A non-empty vector of column names for `by` is required.
I want the result to reflect all customer names for all dates, just like in base, but do it in a data.table-centric way. Is this possible?
If you first construct full names from the first and last in the
cust-dataframe, you can then useCJ(cross-join). You cannot use all three vectors since there would be 99 items and teh first names would get inappropriately mixed with last names.This returns the desired data.table object: