Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8420657
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T02:55:40+00:00 2026-06-10T02:55:40+00:00

I have 2 very large data sets that looks like below: merge_data <- data.frame(ID

  • 0

I have 2 very large data sets that looks like below:

merge_data <- data.frame(ID = c(1,2,3,4,5,6,7,8,9,10), 
                         position=c("yes","no","yes","no","yes", 
                                    "no","yes","no","yes","yes"),
                         school = c("a","b","a","a","c","b","c","d","d","e"),
                         year1 = c(2000,2000,2000,2001,2001,2000,
                                   2003,2005,2008,2009), 
                         year2=year1-1)


 merge_data

 ID position school year1 year2
 1   1  support   a  2000  1999
 2   2   oppose   b  2000  1999
 3   3  support   a  2000  1999
 4   4   oppose   a  2001  2000
 5   5  support   c  2001  2000
 6   6   oppose   b  2000  1999
 7   7  support   c  2003  2002
 8   8   oppose   d  2005  2004
 9   9  support   d  2008  2007
 10 10  support   e  2009  2008



merge_data_2 <- data.frame(year=c(1999,1999,2000,2000,2000,2001,2003
                                  ,2012,2009,2009,2008,2002,2009,2005,
                                  2001,2000,2002,2000,2008,2005),
                           amount=c(100,200,300,400,500,600,700,800,900,
                                    1000,1100,1200,1300,1400,1500,1600,
                                    1700,1800,1900,2000), 
                           ID=c(1,1,2,2,2,3,3,3,5,6,8,9,10,13,15,17,19,20,21,7))


  merge_data_2
   year amount ID
1  1999    100  1
2  1999    200  1
3  2000    300  2
4  2000    400  2
5  2000    500  2
6  2001    600  3
7  2003    700  3
8  2012    800  3
9  2009    900  5
10 2009   1000  6
11 2008   1100  8
12 2002   1200  9
13 2009   1300 10
14 2005   1400 13
15 2001   1500 15
16 2000   1600 17
17 2002   1700 19
18 2000   1800 20
19 2008   1900 21
20 2005   2000  7

And what I want is:

 ID position school year1 year2 amount
 1    yes    a      2000  1999  300
 2    no     b      2000  1999  1200
10    yes    e      2009  2008  1300

for ID=1 in the merge_data_2, we have amount =300, since there are 2 cases where ID=1,and their year1 or year1 is equal to the year of ID=1 in merge_data

So basically what I want is to perform a merge based on the ID and year.
2 conditions:

  1. ID from merge_data matches the ID from merge_data_2
  2. one of the year1 and year2 from merge_data also matches the year from merge_data_2.
    then make the merge based on the sum of the amount for each IDs.

and I think the code will be something looks like:

merge_data_final <- merge(merge_data, merge_data_2, 
                          merge_data$ID == merge_data_2$ID && (merge_data$year1 || 
                            merge_data$year2 == merge_data_2$year))

Then somehow to aggregate the amount by ID.

Obviously I know the code is wrong, and I have been thinking about plyr or reshape library, but was having difficulties of getting my hands on them.

Any helps would be great! thanks guys!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T02:55:41+00:00Added an answer on June 10, 2026 at 2:55 am

    As noted above, I think you have some discrepancies between your example input and output data. Here’s the basic approach – you were on the right track with reshape2. You can simply melt() your data into long format so you are joining on a single column instead of the either/or bit you had going on before.

    library(reshape2)
    #melt into long format
    merge_data_m <- melt(merge_data, measure.vars = c("year1", "year2"))
    #merge together, specifying the joining columns
    merge(merge_data_m, merge_data_2, by.x = c("ID", "value"), by.y = c("ID", "year"))
    #-----
      ID value position school variable amount
    1  1  1999      yes      a    year2    100
    2  1  1999      yes      a    year2    200
    3  2  2000       no      b    year1    500
    4  2  2000       no      b    year1    300
    5  2  2000       no      b    year1    400
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a very large possible data set that I am trying to visualize
I have a very large XML file which has like 40000 data, and when
I have a webpage that displays a very large list of data. Since this
Let's say I have some data in R that looks like this: c(0.11, NA,
I have a very large data set that I'm trying to find the smallest
We have large sets (10+) of very large files (> 1 GB) that we
I have been playing at work with some very very large sets of data,
I have a very large data file with around 60000 rows. I need to
I have a very large set of permissions in my application that I represent
I have a very large list Suppose I do that (yeah, I know the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.