I have a data frame set up with one column as a factor with several levels. I’d like to extract rows that do not have a unique value for that column (i.e. the level is present in multiple rows).
So for some simple test data:
factor dat1 dat2 dat3
a 1.0 1.0 1.0
a 1.0 0.9 1.0
b 0.9 0.8 0.6
c 0.9 1.0 0.0
I’d like to retain only the first two rows. What is the best way to do this? Preferrably I’d like to make more general queries, i.e. extract rows for levels of the factor present in at least 3 rows, exactly 2 rows, etc.
Here’s a solution with
table(assuming the data frame’s name isdf):If you want to use an exact criterion instead, change
>=to==.The result: