Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8639759
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T11:06:06+00:00 2026-06-12T11:06:06+00:00

This is a bit of a philosophical question about data.table join syntax. I am

  • 0

This is a bit of a philosophical question about data.table join syntax. I am finding more and more uses for data.tables, but still learning…

The join format X[Y] for data.tables is very concise, handy and efficient, but as far as I can tell, it only supports inner joins and right outer joins. To get a left or full outer join, I need to use merge:

  • X[Y, nomatch = NA] — all rows in Y — right outer join (default)
  • X[Y, nomatch = 0] — only rows with matches in both X and Y — inner join
  • merge(X, Y, all = TRUE) — all rows from both X and Y — full outer join
  • merge(X, Y, all.x = TRUE) — all rows in X — left outer join

It seems to me that it would be handy if the X[Y] join format supported all 4 types of joins. Is there a reason only two types of joins are supported?

For me, the nomatch = 0 and nomatch = NA parameter values are not very intuitive for the actions being performed. It is easier for me to understand and remember the merge syntax: all = TRUE, all.x = TRUE and all.y = TRUE. Since the X[Y] operation resembles merge much more than match, why not use the merge syntax for joins rather than the match function’s nomatch parameter?

Here are code examples of the 4 join types:

# sample X and Y data.tables
library(data.table)
X <- data.table(t = 1:4, a = (1:4)^2)
setkey(X, t)
X
#    t  a
# 1: 1  1
# 2: 2  4
# 3: 3  9
# 4: 4 16

Y <- data.table(t = 3:6, b = (3:6)^2)
setkey(Y, t)
Y
#    t  b
# 1: 3  9
# 2: 4 16
# 3: 5 25
# 4: 6 36

# all rows from Y - right outer join
X[Y]  # default
#  t  a  b
# 1: 3  9  9
# 2: 4 16 16
# 3: 5 NA 25
# 4: 6 NA 36

X[Y, nomatch = NA]  # same as above
#    t  a  b
# 1: 3  9  9
# 2: 4 16 16
# 3: 5 NA 25
# 4: 6 NA 36

merge(X, Y, by = "t", all.y = TRUE)  # same as above
#    t  a  b
# 1: 3  9  9
# 2: 4 16 16
# 3: 5 NA 25
# 4: 6 NA 36

identical(X[Y], merge(X, Y, by = "t", all.y = TRUE))
# [1] TRUE

# only rows in both X and Y - inner join
X[Y, nomatch = 0]  
#    t  a  b
# 1: 3  9  9
# 2: 4 16 16

merge(X, Y, by = "t")  # same as above
#    t  a  b
# 1: 3  9  9
# 2: 4 16 16

merge(X, Y, by = "t", all = FALSE)  # same as above
#    t  a  b
# 1: 3  9  9
# 2: 4 16 16

identical( X[Y, nomatch = 0], merge(X, Y, by = "t", all = FALSE) )
# [1] TRUE

# all rows from X - left outer join
merge(X, Y, by = "t", all.x = TRUE)
#    t  a  b
# 1: 1  1 NA
# 2: 2  4 NA
# 3: 3  9  9
# 4: 4 16 16

# all rows from both X and Y - full outer join
merge(X, Y, by = "t", all = TRUE)
#    t  a  b
# 1: 1  1 NA
# 2: 2  4 NA
# 3: 3  9  9
# 4: 4 16 16
# 5: 5 NA 25
# 6: 6 NA 36

Update: data.table v1.9.6 introduced the on= syntax, which allows ad hoc joins on fields other than the primary key. jangorecki’s answer to the question How to join (merge) data frames (inner, outer, left, right)? provides some examples of additional join types that data.table can handle.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T11:06:08+00:00Added an answer on June 12, 2026 at 11:06 am

    To quote from the data.table FAQ 1.11 What is the difference between X[Y] and merge(X, Y)?

    X[Y] is a join, looking up X’s rows using Y (or Y’s key if it has one) as an index.

    Y[X] is a join, looking up Y’s rows using X (or X’s key if it has one)

    merge(X,Y) does both ways at the same time. The number of rows of X[Y] and Y[X] usually differ, whereas the number of rows returned by merge(X,Y) and merge(Y,X) is the same.

    BUT that misses the main point. Most tasks require something to be done on the
    data after a join or merge. Why merge all the columns of data, only to
    use a small subset of them afterwards? You may suggest
    merge(X[,ColsNeeded1],Y[,ColsNeeded2]), but that requires the programmer to work out which columns are needed. X[Y,j] in data.table does all that in one step for
    you. When you write X[Y,sum(foo*bar)], data.table automatically inspects the j expression to see which columns it uses. It will only subset those columns only; the others are ignored. Memory is only created for the columns the j uses, and Y columns enjoy standard R recycling rules within the context of each group. Let’s say foo is in X, and bar is in Y (along with 20 other columns in Y). Isn’t X[Y,sum(foo*bar)] quicker to program and quicker to run than a merge of everything wastefully followed by a subset?


    If you want a left outer join of X[Y]

    le <- Y[X]
    mallx <- merge(X, Y, all.x = T)
    # the column order is different so change to be the same as `merge`
    setcolorder(le, names(mallx))
    identical(le, mallx)
    # [1] TRUE
    

    If you want a full outer join

    # the unique values for the keys over both data sets
    unique_keys <- unique(c(X[,t], Y[,t]))
    Y[X[J(unique_keys)]]
    ##   t  b  a
    ## 1: 1 NA  1
    ## 2: 2 NA  4
    ## 3: 3  9  9
    ## 4: 4 16 16
    ## 5: 5 25 NA
    ## 6: 6 36 NA
    
    # The following will give the same with the column order X,Y
    X[Y[J(unique_keys)]]
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have this bit of code: using System; using System.Collections.Generic; using System.ComponentModel; using System.Data;
I came across this bit of vba code posted in another SO question. Is
I have this bit of HTML code: <center><table><tr><td>table</td></tr></table></center> How can i unwrap table from
I have this bit of code from a stackoverflow question : template <typename K,
I have this bit of html code, but when I click the link, it
I have a little bit philosophical question. There is a class A class A
This bit of code is taking almost a half second to execute. Could somebody
This bit of code comes with new classes that are subclasses of UITableViewController... -
To this bit of code I pass the string kellogs special k and I
IS this bit of code in PHP/mysql considered a stored procedure? $sql = 'SELECT

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.