If I have the data.tables DT and neighbors:
set.seed(1)
library(data.table)
DT <- data.table(idx=rep(1:10, each=5), x=rnorm(50), y=letters[1:5], ok=rbinom(50, 1, 0.90))
n <- data.table(y=letters[1:5], y1=letters[c(2:5,1)])
n is a lookup table. Whenever ok == 0, I want to look up the corresponding y1 in n and use that value for x and the given idx. By way of example, row 4 of DT:
> DT
idx x y ok
1: 1 -0.6264538 a 1
2: 1 0.1836433 b 1
3: 1 -0.8356286 c 1
4: 1 1.5952808 d 0
5: 1 0.3295078 e 1
6: 2 -0.8204684 a 1
The y1 from n for d is e:
> n[y == 'd']
y y1
1: d e
and idx for row 4 is 1. So I would use:
> DT[idx == 1 & y == 'e', x]
[1] 0.3295078
I want my output to be a data.table just like DT[ok == 0] with all the x values replaced by their appropriate n[‘y1’] x value:
> output
idx x y ok
1: 1 0.3295078 d 0
2: 2 -0.3053884 d 0
3: 3 0.3898432 a 0
4: 5 0.7821363 a 0
5: 7 1.3586800 e 0
6: 8 0.7631757 d 0
I can think of a few ways of doing this with base R or with plyr… and maybe its late on Friday… but whatever the sequences of merges that this would require in data.table is beyond me!
Great question. Using the functions in the other answers and wrapping Blue’s answer into a function
blue, how about the following. The benchmarks include the time tosetkeyin all cases.The
orderis needed in theidenticalbecauseredreturns the result in the same order asDT[ok==0]whereasblueappears to be ordered byy1in the case of ties inidx.If
y1is unwanted in the result it can be removed instantly (regardless of table size) usingans[,y1:=NULL]; i.e., this can be included above to produce the exact result requested in question, without affecting the timings at all.