I have two datasets need to merge.
The first one is a big dataset with studyid and discharg(the date when the patients got their discharge).
The second one is the fewer observations than first one. They have two columns: studyid and call_mad(the date when nurse call the patient after discharge date). Not all discharges get a call from nurse.
The first table is
STUDYID DISCHARG
10011 2008-10-29
10011 2008-11-7
10011 2008-11-18
10011 2009-10-17
10011 2010-1-2
10011 2010-1-22
The second table is
STUDYID CALL_MAD
10011 2009-10-19
10011 2010-1-25
The final table I want
STUDYID DISCHARG CALL_MAD
10011 2008-10-29
10011 2008-11-7
10011 2008-11-18
10011 2009-10-17 2009-10-19
10011 2010-1-2
10011 2010-1-22 2010-1-25
Hopefully, it is clear. Thanks in advance.
Jane
I had the same idea as thelatemail, i.e. you first extract the latest DISCHARG date that is < (or possibly <=) each CALL_MAD date, then merge that data back to the original dataset. I think that is the best that can be done with the data structured as it is, although there is potential for this logic to break down (e.g. if the nurse’s call didn’t relate to the latest discharge). Ideally you would want to add the DISCHARG date column to the second table as a secondary key, so that it would be easy to join on STUDYID and DISCHARG date without making any assumptions.
Anyway, here the code I used.