Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8535077
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T10:20:50+00:00 2026-06-11T10:20:50+00:00

If I have the data.tables DT and neighbors : set.seed(1) library(data.table) DT <- data.table(idx=rep(1:10,

  • 0

If I have the data.tables DT and neighbors:

set.seed(1)
library(data.table)
DT <- data.table(idx=rep(1:10, each=5), x=rnorm(50), y=letters[1:5], ok=rbinom(50, 1, 0.90))
n <- data.table(y=letters[1:5], y1=letters[c(2:5,1)])

n is a lookup table. Whenever ok == 0, I want to look up the corresponding y1 in n and use that value for x and the given idx. By way of example, row 4 of DT:

> DT
   idx          x y ok
1:   1 -0.6264538 a  1
2:   1  0.1836433 b  1
3:   1 -0.8356286 c  1
4:   1  1.5952808 d  0
5:   1  0.3295078 e  1
6:   2 -0.8204684 a  1

The y1 from n for d is e:

> n[y == 'd']
   y y1
1: d  e

and idx for row 4 is 1. So I would use:

> DT[idx == 1 & y == 'e', x]
[1] 0.3295078

I want my output to be a data.table just like DT[ok == 0] with all the x values replaced by their appropriate n[‘y1’] x value:

> output
   idx          x y ok
1:   1  0.3295078 d  0
2:   2 -0.3053884 d  0
3:   3  0.3898432 a  0
4:   5  0.7821363 a  0
5:   7  1.3586800 e  0
6:   8  0.7631757 d  0

I can think of a few ways of doing this with base R or with plyr… and maybe its late on Friday… but whatever the sequences of merges that this would require in data.table is beyond me!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T10:20:52+00:00Added an answer on June 11, 2026 at 10:20 am

    Great question. Using the functions in the other answers and wrapping Blue’s answer into a function blue, how about the following. The benchmarks include the time to setkey in all cases.

    red = function() {
        ans = DT[ok==0]
          # Faster than setkey(DT,ok)[J(0)] if the vector scan is just once
          # If lots of lookups to "ok" need to be done, then setkey may be worth it
          # If DT[,ok:=as.integer(ok)] can be done first, then ok==0L slightly faster
    
        # After extracting ans in the original order of DT, we can now set the key :
        setkey(DT,idx,y)
        setkey(n,y)
    
        # Now working with the reduced ans ...
    
        ans[,y1:=n[y,y1,mult="first"]]
        # Add a new column y1 by reference containing the lookup in n
        # mult="first" because we know n's key is unique, for speed (to save looking
        # for groups of matches in n). Future version of data.table won't need this.
        # Also, mult="first" has the advantage of dropping group columns (so we don't
        # need [[2L]]). mult="first"|"last" turns off by-without-by of mult="all".
    
        ans[,x:=DT[ans[,list(idx,y1)],x,mult="first"]]
        # Changes the contents of ans$x by reference. The ans[,list(idx,y1)] part is
        # how to pick the columns of ans to join to DT's key when they are not the key
        # columns of ans and not the first 1:n columns of ans. There is no need to key
        # ans, especially since that would change ans's order and not strictly answer
        # the question. If idx and y1 were columns 1 and 2 of (unkeyed) ans then we
        # wouldn't need that part, just
        #    ans[,x:=DT[ans,x,mult="first"]]
        # would do (relying on DT having 2 columns in its key). That has the advantage
        # of not copying the idx and y1 columns into a new data.table to pass as the i
        # DT. To save that copy y1 could be moved to column 2 using setcolorder first.
    
        redans <<- ans
        }
    

    crdt(1e5)
    origDT = copy(DT)
    benchmark(blue={DT=copy(origDT); system.time(blue())},
              red={DT=copy(origDT); system.time(red())},
              fun={DT=copy(origDT); system.time(fun(DT,n))},
              replications=3, order="relative")
    
    test replications elapsed relative user.self sys.self user.child sys.child
     red            3   1.107    1.000     1.100    0.004          0         0
    blue            3   5.797    5.237     5.660    0.120          0         0
     fun            3   8.255    7.457     8.041    0.184          0         0
    
    crdt(1e6)
    [ .. snip .. ]
    test replications elapsed relative user.self sys.self user.child sys.child
     red            3  14.647    1.000    14.613    0.000          0         0
    blue            3  87.589    5.980    87.197    0.124          0         0
     fun            3 197.243   13.466   195.240    0.644          0         0
    
    identical(blueans[,list(idx,x,y,ok,y1)],redans[order(idx,y1)])
    # [1] TRUE
    

    The order is needed in the identical because red returns the result in the same order as DT[ok==0] whereas blue appears to be ordered by y1 in the case of ties in idx.

    If y1 is unwanted in the result it can be removed instantly (regardless of table size) using ans[,y1:=NULL]; i.e., this can be included above to produce the exact result requested in question, without affecting the timings at all.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have data in two tables that I want to query from. Each table
I have data in 2 tables, and I want to create a report. Table
I have three tables of data: table: cars [10,000 rows] table: planes [2,000 rows]
I have two data tables in a data set for example lets say Servers:
I have two datagrids side by side bound to different data tables and each
I have two data.tables in R which have an identical set of columns. They
I have a data tables with this type of setup. Table A AID AName
I have the following lookup table: lkp <- data.frame( x=c(0,0.2,0.65,0.658,1.3,1.76,2.7), y=c(1,1,1,0.942,0.942, 0.92, 0.89) )
I have data in two tables. The first table has a Primary Key called
I have data of matrix stored in table as below tables: MatrixDimensions - MatrixId,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.