note: this question and the following answers refer to data.table versions < 1.5.3; v. 1.5.3 was released in Feb 2011 to resolve this issue. see more recent treatment (03-2012): Translating SQL joins on foreign keys to R data.table syntax
I’ve been digging through the documentation for the data.table package (a replacement for data.frame that’s much more efficient for certain operations), including Josh Reich’s presentation on SQL and data.table at the NYC R Meetup (pdf), but can’t figure this totally trivial operation out.
> x <- DT(a=1:3, b=2:4, key='a')
> x
a b
[1,] 1 2
[2,] 2 3
[3,] 3 4
> y <- DT(a=1:3, c=c('a','b','c'), key='a')
> y
a c
[1,] 1 a
[2,] 2 b
[3,] 3 c
> x[y]
a b
[1,] 1 2
[2,] 2 3
[3,] 3 4
> merge(x,y)
a b c
1 1 2 a
2 2 3 b
3 3 4 c
The docs say “When [the first argument] is itself a data.table, a join is invoked similar to base::merge but uses binary search on the sorted key.” Clearly this is not the case. Can I get the other columns from y into the result of x[y] with data.tables? It seems like it’s just taking the rows of x where the key matches the key of y, but ignoring the rest of y entirely…
You are quoting the wrong part of documentation. If you have a look at the doc of
[.data.tableyou will read:I admit the description of the package (the part you quoted) is somewhat confusing, because it seems to say that the “[“-operation can be used instead of merge. But I think what it says is: if x and y are both data.tables we use a join on an index (which is invoked like merge) instead of binary search.
One more thing:
The data.table library I installed via
install.packageswas missing themerge.data.table method, so usingmergewould callmerge.data.frame. After installing the package from R-Forge R used the fastermerge.data.tablemethod.You can check if you have the merge.data.table method by checking the output of:
EDIT [Answer no longer valid]: This answer refers to data.table version 1.3. In version 1.5.3 the behaviour of data.table changed and x[y] returns the expected results. Thank you Matthew Dowle, author of data.table, for pointing this out in the comments.