Say I had the dataframe df <- data.frame(‘A’ = c(‘a’,’a’,’a’,’a’,’b’,’b’,’b’,’b’,’b’), ‘B’ = c(‘y’,’y’,’z’,’z’,’y’,’y’,’y’,’z’,’z’), ‘value’=c(1

Question

0

Asked: June 5, 20262026-06-05T05:06:18+00:00 2026-06-05T05:06:18+00:00

Say I had the dataframe df <- data.frame(‘A’ = c(‘a’,’a’,’a’,’a’,’b’,’b’,’b’,’b’,’b’), ‘B’ = c(‘y’,’y’,’z’,’z’,’y’,’y’,’y’,’z’,’z’), ‘value’=c(1

0

Say I had the dataframe

df <- data.frame('A' = c('a','a','a','a','b','b','b','b','b'),
                 'B' = c('y','y','z','z','y','y','y','z','z'),
                 'value'=c(1  , 2 , 2 , 3 , 2 , 3 , 1 , 2 , 2))

so it looked like this

I could get the mean of each subset of A and B using the query

with(df, aggregate(df, by = list(A, B), FUN = mean))

which after a bit of manipulation gives

A B value  
a y   1.5  
b y   2.0  
a z   2.5  
b z   2.0

Is there are way of doing this but only calculating the mean of the highest x values in each subset. So if we take x as 2 in this example the mean of the subsets ay, az, and bz would not change as they only have a total of two entries (so the top x entries are the entire dataset of the subset). However by has three entries so we would want to return the mean of the highest two values (2 and 3) so that the output table would look like

A B value  
a y   1.5  
b y   2.5  
a z   2.5  
b z   2.0

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T05:06:20+00:00

I find it easier to use the formula interface to aggregate, as follows:

Your original version:

aggregate(value~A+B, data=df, FUN = mean)
  A B value
1 a y   1.5
2 b y   2.0
3 a z   2.5
4 b z   2.0

You can get your desired version by using an anonymous function that computes the mean of the tail of the sorted values:

aggregate(value~A+B, data=df, FUN = function(x)mean(tail(sort(x), 2)))
  A B value
1 a y   1.5
2 b y   2.5
3 a z   2.5
4 b z   2.0

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Say I had the dataframe df <- data.frame(‘A’ = c(‘a’,’a’,’a’,’a’,’b’,’b’,’b’,’b’,’b’), ‘B’ = c(‘y’,’y’,’z’,’z’,’y’,’y’,’y’,’z’,’z’), ‘value’=c(1

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply