What I have:
I have a “master” dataframe that has the following columns:
userid, condition
Since there are four experiment conditions, I also have four dataframes that carry answer information, with the following columns:
userid, condition, answer1, answer2
Now, I’d like to join these, so all combinations of user IDs, conditions and their answers to these conditions are merged. Each condition should only have the correct answer in the appropriate column, per row.
Short, self-contained example:
master = data.frame(userid=c("foo","foo","foo","foo","bar","bar","bar","bar"), condition=c("A","B","C","D","A","B","C","D"))
cond_a = data.frame(userid=c("foo","bar"), condition="A", answer1=c("1","1"), answer2=c("2","2"))
cond_b = data.frame(userid=c("foo","bar"), condition="B", answer1=c("3","3"), answer2=c("4","4"))
cond_c = data.frame(userid=c("foo","bar"), condition="C", answer1=c("5","5"), answer2=c("6","6"))
cond_d = data.frame(userid=c("foo","bar"), condition="D", answer1=c("7","7"), answer2=c("8","8"))
How do I merge all conditions into the master, so the master table looks like follows?
userid condition answer1 answer2
1 bar A 1 2
2 bar B 3 4
3 bar C 5 6
4 bar D 7 8
5 foo A 1 2
6 foo B 3 4
7 foo C 5 6
8 foo D 7 8
I’ve tried the following:
temp = merge(master, cond_a, all.x=TRUE)
Which gives me:
userid condition answer1 answer2
1 bar A 1 2
2 bar B <NA> <NA>
3 bar C <NA> <NA>
4 bar D <NA> <NA>
5 foo A 1 2
6 foo B <NA> <NA>
7 foo C <NA> <NA>
8 foo D <NA> <NA>
But as soon as I do this…
merge(temp, cond_b, all.x=TRUE)
There are no values for condition B. How come?
userid condition answer1 answer2
1 bar A 1 2
2 bar B <NA> <NA>
3 bar C <NA> <NA>
4 bar D <NA> <NA>
5 foo A 1 2
6 foo B <NA> <NA>
7 foo C <NA> <NA>
8 foo D <NA> <NA>
You can use
Reduce()andcomplete.cases()as follows:Reduce()might take some getting accustomed to. You define your function, and then provide alistof objects to repeatedly apply the function to. Thus, that statement is like doing:Or something like:
complete.cases()creates a logical vector of whether the specified columns are “complete” or not; this logical vector can be used to subset from the mergeddata.frame.