Say I had the dataframe
df <- data.frame('A' = c('a','a','a','a','b','b','b','b','b'),
'B' = c('y','y','z','z','y','y','y','z','z'),
'value'=c(1 , 2 , 2 , 3 , 2 , 3 , 1 , 2 , 2))
so it looked like this
A B value
a y 1
a y 2
a z 2
a z 3
b y 2
b y 3
b y 1
b z 2
b z 2
I could get the mean of each subset of A and B using the query
with(df, aggregate(df, by = list(A, B), FUN = mean))
which after a bit of manipulation gives
A B value
a y 1.5
b y 2.0
a z 2.5
b z 2.0
Is there are way of doing this but only calculating the mean of the highest x values in each subset. So if we take x as 2 in this example the mean of the subsets ay, az, and bz would not change as they only have a total of two entries (so the top x entries are the entire dataset of the subset). However by has three entries so we would want to return the mean of the highest two values (2 and 3) so that the output table would look like
A B value
a y 1.5
b y 2.5
a z 2.5
b z 2.0
I find it easier to use the formula interface to
aggregate, as follows:Your original version:
You can get your desired version by using an anonymous function that computes the mean of the tail of the sorted values: