I’m brand new to the (completely marvelous) data.table package, and seem to have gotten stuck on a very basic, somewhat bizarre problem. I can’t post the exact data set I’m working with, for which I apologize — but I think the problem is simple enough to articulate that hopefully this will still be very clear.
Let’s say I have a data.table like so, with key x:
set1
x y
1: 1 a
2: 1 b
3: 1 c
4: 2 a
I want to return a subset of set1 containing all rows where x == 1. This is wonderfully simple in data.table: set1[J(1)]. Bam. Done. I can also assign z <- 1, and call set1[J(z)]. Again: works great.
…except when I try to scale it up to my actual data set, which contains ~6M rows. When I call set1[J(1674)], I get back a 78-row return that’s exactly what I’m looking for. But I need to be able to look up (literally) 4M of these subsets. When I assign the value I’m searching for to a variable, id <- 1674, and call set1[J(id)]… R nearly takes down my desktop.
Clearly something I don’t understand is going on under the data.table hood, but I haven’t been able to figure out what. Googling and slogging through Stack Overflow suggest that this should work. Out of pure whimsey, I’ve tried:
id <- quote(1674)
set1[J(eval(id))]
…but that is far, far worse. What… what’s going on?
[ @mnel beat me to it as I was writing …]
Almost certainly, one column of
set1happens to be called"id"; i.e.,causing
set1[J(id)]to self joinset1$idtoset1, ignoring theidin calling scope.If so, there are several approaches to avoid scoping issues such as this :
or use the fact that a single name
iis evaluated in calling scope :or that
evaliseval‘d in calling scope, too :or, we do want to make this clearer, more robust and easier, so one thought is to add
..:or perhaps :
where
..borrows its meaning from the file system’s.., meaning one-level-up. If the..was a prefix to symbols, you could then do something like :where
==is used there for illustration. In that examplecolBis expected to be a column name and..idwill findidin calling scope (one level up). The thinking is that that would be quite clear to the reader of the code what the programmer intended.