Having df1 and df2 as follows:
df1 <- read.table(text =" x y z
1 1 1
1 2 1
1 1 2
2 1 1
2 2 2",header=TRUE)
df2 <- read.table(text =" a b c
1 1 1
1 2 8
1 1 2
2 6 2",header=TRUE)
I can ask of the data a bunch of things like:
df2[ df2$b == 6 | df2$c == 8 ,] #any rows where b=6 plus c=8 in df2
#and additive conditions
df2[ df2$b == 6 & df2$c == 8 ,] # zero rows
between data.frame:
df1[ df1$z %in% df2$c ,] # rows in df1 where values in z are in c (allrows)
This gives me all rows:
df1[ (df1$x %in% df2$a) &
(df1$y %in% df2$b) &
(df1$z %in% df2$c) ,]
but shouldn’t this give me all rows of df1 too:
df1[ df1$z %in% df2$c | df1$b == 9,]
What I am really hoping to do is to subset df1 an df2 on three column conditions,
so that I only get rows in df1 where a,b,c all equal x,y,z at the same time within a row. In real data i will have more than 3 columns but I will still want to subset on 3 additive column conditions.
So subsetting my example data df1 on df2 my result would be:
df1
1 1 1
1 1 2
Playing with syntax has confusedme more and the SO posts are all variaion of what I want that actually lead to more confusion for me.
I figured out I can do this:
merge(df1,df2, by.x=c("x","y","z"),by.y=c("a","b","c"))
which gives me what I want, but I would like to understand why I am wrong in my [ attempts.
In addition to your nice solution using
merge(thanks for that, I always forgetmerge), this can be achieved in base using?interactionas follows. There may be other variations of this, but this is the one I am familiar with:Now to answer your question: First, I think there’s a typo (corrected) in:
You would get an error, because the first part evaluates to
and the second evaluates to:
You do a
|operation on unequal lengths getting the error:Edit: If you have multiple columns then you can choose the interaction as such. For example, if you want to get from
df1the rows where the first two columns match with that ofdf2, then you could simply do: