Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8710187
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T04:33:46+00:00 2026-06-13T04:33:46+00:00

I am attempting to understand the logic in the data.table from the documentation and

  • 0

I am attempting to understand the logic in the data.table from the documentation and a bit unclear. I know I can just try this and see what happens but I would like to make sure that there is no pathological case and therefore would like to know how the logic was actually coded. When two data.table objects have a different number of key columns, for example a has 2 and b has 3, and you run c <- a[b], will a and b be merged simply on the first two key columns or will the third column in a be automatically merged to the 3rd key column in b? An example:

require(data.table)
a <- data.table(id=1:10, t=1:20, v=1:40, key=c("id", "t"))
b <- data.table(id=1:10, v2=1:20, key="id")
c <- a[b]

This should select rows of a that match the id key column in b. For example, for id==1 in b, there are 2 rows in b and 4 rows in a that should generate 8 rows in c. This is indeed what seems to happen:

> head(c,10)
    id  t  v v2
 1:  1  1  1  1
 2:  1  1 21  1
 3:  1 11 11  1
 4:  1 11 31  1
 5:  1  1  1 11
 6:  1  1 21 11
 7:  1 11 11 11
 8:  1 11 31 11
 9:  2  2  2  2
10:  2  2 22  2

The other way to try it is to do:

d <-b[a]

This should do the same thing: for every row in a it should select the matching row in b: since a has an extra key column, t, that column should not be used for matching and a join based only on the first key column, id should be done. It seems like this is the case:

> head(d,10)
    id v2  t  v
 1:  1  1  1  1
 2:  1 11  1  1
 3:  1  1  1 21
 4:  1 11  1 21
 5:  1  1 11 11
 6:  1 11 11 11
 7:  1  1 11 31
 8:  1 11 11 31
 9:  2  2  2  2
10:  2 12  2  2

Can someone confirm? To be clear: is the third key column of a ever used in any of the merges or does data.table only use the min(length(key(DT))) of the two tables.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T04:33:47+00:00Added an answer on June 13, 2026 at 4:33 am

    Good question. First the correct terminology is (from ?data.table) :

    [A data.table] may have one key of one or more columns. This key can be used for row indexing instead of rownames.

    So “key” (singlular) not “keys” (plural). We can get away with “keys”, currently. But when secondary keys are added in future, there may then be multiple keys. Each key (singular) can have multiple columns (plural).

    Otherwise you’re absolutely correct. The following paragraph was improved in v1.8.2 based on feedback from others also confused. From ?data.table:

    When i is a data.table, x must have a key. i is joined to x using x’s key and the rows in x that match are returned. An equi-join is performed between each column in i to each column in x’s key; i.e., column 1 of i is matched to the 1st column of x’s key, column 2 to the second, etc. The match is a binary search in compiled C in O(log n) time. If i has fewer columns than x’s key then many rows of x will ordinarily match to each row of i since not all of x’s key columns will be joined to (a common use case). If i has more columns than x’s key, the columns of i not involved in the join are included in the result. If i also has a key, it is i’s key columns that are used to match to x’s key columns (column 1 of i’s key is joined to column 1 of x’s key, column 2 to column 2, and so on) and a binary merge of the two tables is carried out. In all joins the names of the columns are irrelevant. The columns of x’s key are joined to in order, either from column 1 onwards of i when i is unkeyed, or from column 1 onwards of i’s key.


    Following comments, in v1.8.3 (on R-Forge) this now reads (changes in bold) :

    When i is a data.table, x must have a key. i is joined to x using x’s key and the rows in x that match are returned. An equi-join is performed between each column in i to each column in x’s key; i.e., column 1 of i is matched to the 1st column of x’s key, column 2 to the second, etc. The match is a binary search in compiled C in O(log n) time. If i has fewer columns than x’s key then not all of x’s key columns will be joined to (a common use case) and many rows of x will (ordinarily) match to each row of i. If i has more columns than x’s key, the columns of i not involved in the join are included in the result. If i also has a key, it is i’s key columns that are used to match to x’s key columns (column 1 of i’s key is joined to column 1 of x’s key, column 2 of i’s key to column 2 of x’s key, and so on for as long as the shorter key) and a binary merge of the two tables is carried out. In all joins the names of the columns are irrelevant; the columns of x’s key are joined to in order, either from column 1 onwards of i when i is unkeyed, or from column 1 onwards of i’s key. In code, the number of join columns is determined by min(length(key(x)),if (haskey(i)) length(key(i)) else ncol(i)).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Attempting to use the data series from this example no longer passes the JSONLint
I understand that there are better ways to do what I'm attempting, but this
I am attempting to understand why development had shifted from reshape to reshape2 package.
Attempting to better understand chaining on a hashtable. Seeking a simple table that hashes
I am attempting to understand the c library qsort in the context of pointers
Having never used awk before on Linux I am attempting to understand how it
Attempting to set up an automated texting service for customers, where people can text
Why can I not do this: char* p = new char[10]; void SetString(char *
I am attempting to create a Chrome extension to help me understand Chrome's webRequest
I am attempting to connect to a SQL Server 2008 R2 instance from another

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.