I have to select a bunch of data from a data frame depending on certain conditions. The data frame looks roughly like this:
F1 F2 D1 D2
1 A1 B1 1 0
2 A1 B1 1 1
3 A1 B1 0 0
4 A1 B2 1 0
5 A1 B2 0 0
6 A2 B2 1 0
7 A2 B2 1 1
The Fx are factors, and the Dx are data values. What I have to do is the following:
- Find rows with data values that match a specific pattern.
- For each row that matches that pattern, find all rows that have the same factors
- For each unique factor combination, apply some operation to all rows that have that combination
For example,
factors <- unique(data[D1==1 & D2 == 1, c("F1","F2")])
will give me step 1 and most of 2.
And with
data[data$F1 %in% factors$F1 & data$F2 %in% factors%F2,]
I’m getting closer to the solution, but with the example data above, this will select all rows. But rows 4 and 5 should not be selected, because they are not an exact match. How can I add in some sort of condition that required that the %in% matches happen on the same row?
I feel like this is something that should be a common operation and thus R probably has a clever way for doing this.
Any ideas???? Thanks.
You can use the indexing of the
data.tablepackage to select all rows that have to be manipulated.