Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9086225
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T21:26:11+00:00 2026-06-16T21:26:11+00:00

I’m trying to join two datasets together. Call them x and y. I believe

  • 0

I’m trying to join two datasets together. Call them x and y. I believe that the ID variables in y are a subset of the ID variables in x. But not in the pure sense because I know that x contains more IDs than y but I don’t know the mapping. That is, some (but not all) of the IDs in x and y can be matched 1:1.

My ultimate goal is to figure out where this 1:1 mapping fails and flag these observations. I thought merge would be the way to go but maybe not. An example is below:

id <- c(1:10, 1:100)

X1 <- rnorm(110, mean = 0, sd = 1)
year <- c("2004","2005","2006","2001","2002") 
year <- rep(year, 22)

month = c("Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar","Apr")
month <- rep(month, 11)

#dataset X
x <- cbind(id, X1, month, year)

#dataset Y
id2 <- c(1:10, 200)
Y1 <- rnorm(11, mean = 0 , sd = 1)
y <- cbind(id2,Y1)

#merge on the IDs; but we get an error because when id2 == 200 in y we don't 
#have a match in x 
result <- merge(x, y, by.x="id", by.y = "id2", all =TRUE)

The merge threw an error because id2 == 200 had no match in the x dataset. Unfortunately, I lost the ID and all the information as well! (it should equal 200 in row 111):

tail(result) 
      id                   X1 month year         Y1
106   95  -0.0748386054887876   Nov 2002         NA
107   96    0.196765325477989   Dec 2004         NA
108   97    0.527922135906927   Jan 2005         NA
109   98    0.197927230533413   Feb 2006         NA
110   99 -0.00720474886698309   Mar 2001         NA
111 <NA>                 <NA>  <NA> <NA> -0.9664941

What’s more, I get duplicate observations on the ID variable in the merged file. The id2 == 1 observation only existed once but it just copied it twice (e.g. Y1 takes on the value 1.55 twice).

head(result)
   id                 X1 month year       Y1
1   1  -0.67371266313441   Jul 2004 1.553220
2   1 -0.318666983469993   Jul 2004 1.553220
3  10 -0.608192898092431   Apr 2002 1.234325
4  10  -0.72299929212347   Apr 2002 1.234325
5 100 -0.842111221826554   Apr 2002       NA
6  11  -0.16316681842082   Jul 2004       NA

This merge has made things more complicated than I intended. I was hoping I could examine every observation in x and figure out where the id matched id2 in y and flag the ones that didn’t. So I would get a new vector, call it flag, that takes on a value 1 if x$id had a match in y$id2 and zero otherwise. This way, I could know where the 1:1 mapping failed. I could potentially get some traction on this by re-coding the NAs, but what about the error that gets thrown when id2 == 200? It just discards the information.

I have tried appending by rows with no luck and it looks like I should give up merge as well, perhaps it’s better to wring a loop or function to do something along these lines:

for every observation in x

id2 = which(id2) corresponds to id-month-year

flag = 1 if length of above is == 1, 0 otherwise

etc.

Hopefully this all makes sense. I’d be very grateful for any help or guidance.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T21:26:12+00:00Added an answer on June 16, 2026 at 9:26 pm

    If you are looking for which things in x$id are in y$id2, then you can use

    x$id %in% y$id2
    

    to get a logical vector returning matches. It does not guarantee a 1-to-1 correspondence, however; just a 1-to-many. You can then add this vector to your data frame

    x$match.y <- x$id %in% y$id2
    

    to see what rows of x have a corresponding ID in y.

    To see which observations are 1-to-1, you could do something like

    y$id2[duplicated(y$id2)] #vector of duplicate elements in y$id2
    (x$id %in% y$id2) & !(x$id %in% y$id2[duplicated(y$id2)])
    

    to filter out elements that appear more than once in y$id2. You can also add this to x:

    x$match.y.unique <- (x$id %in% y$id2) & !(x$id %in% y$id2[duplicated(y$id2)])
    

    The same procedure can be done for y to determine what rows of y match in x, and which ones match uniquely.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a French site that I want to parse, but am running into
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
I need a function that will clean a strings' special characters. I do NOT
I'm trying to create an if statement in PHP that prevents a single post
I am trying to understand how to use SyndicationItem to display feed which is
Basically, what I'm trying to create is a page of div tags, each has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I want to count how many characters a certain string has in PHP, but

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.