I’m a R beginner and having difficulty with the following pretty simple problem;
I have two data frames ( All_df, Bad_df) and want to generate a third such that
All_df – Bad_df = Good_df
> All_df
Row# Originator Recipient Date Time
4 1 6 2000-05-16 16:15:00
7 2 7 2000-05-16 16:25:00
22 2 4 2000-07-04 18:05:00
25 2 9 2000-08-07 05:23:00
10 3 2 2000-06-17 18:07:00
13 4 8 2000-06-21 06:49:00
> Bad_df
Row# Originator Recipient Date Time
4 2 6 2000-05-16 16:15:00
7 2 7 2000-05-16 16:25:00
22 6 4 2000-07-04 18:05:00
25 12 9 2000-08-07 05:23:00
10 30 2 2000-06-17 18:07:00
13 32 8 2000-06-21 06:49:00
I want to generate Good_df similar to this:
> Good_df
Row# Originator Recipient Date Time
4 1 6 2000-05-16 16:15:00
10 3 2 2000-06-17 18:07:00
13 4 8 2000-06-21 06:49:00
Essentially I need a function which searches All_df$ Originator for values that appear in Bad_df$ Originator, eliminating any matches before returning the remaining values to the Good_df.
I have tried
Good_df <-subset(All_df, Originator %in% Bad_df$Originator)
however nrows of each df looks a little off!
> nrow(All_df)
[1] 26,032
> nrow(Bad_df)
[1] 1,452
> nrow(Good_df)
[1] 12,395
Any assistance would be greatly appreciated.
Quite intuitively,
gives you the subset of All_df for bad originators. What you want is to negate your filter to get the subset of good (or non-bad) originators, using the
!operator:If you are uncomfortable with the precedency rule, you can add a set of parenthesis: