Here is a sample: > tmp label value1 value2 1 aa_x_x xx xx 2

Question

0

Asked: May 23, 20262026-05-23T14:32:44+00:00 2026-05-23T14:32:44+00:00

Here is a sample: > tmp label value1 value2 1 aa_x_x xx xx 2

0

Here is a sample:

> tmp
    label   value1  value2
1   aa_x_x  xx      xx
2   bc_x_x  xx      xx
3   aa_x_x  xx      xx
4   bc_x_x  xx      xx

How to calculate median of all repeated labels (or more, of the corresponding values in other data frame columns), but taking into account only the first two letters (ie. “aa_1_1” and “aa_s_3” are the same values)? The list of labels is finite and usable.

I have read about aggregate, %in%, subset and substr, but I am unable to compile anything useful and simple.

Here is what I hope to get:

> tmp.result
    label   median1 some.calculation2
1   aa      xx      xx
2   bc      xx      xx
3   aa      xx      xx
4   bc      xx      xx

Thank you very much.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T14:32:45+00:00

Have you tried making a new data frame–I’ll call it tmp2–where tmp2$label==substr(tmp$label,0,2)? From there, you can, for example, use tapply(tmp2$value1,tmp2$label,mean) to get the average values of value1 aggregated over tmp2$label.

An option using dplyr

library(dplyr)
tmp %>%
   group_by(label=sub('_.*$', '', label)) %>% 
   transmute(median1=median(value1), mean1=mean(value2))

Or data.table

 library(data.table)
 setDT(tmp)[,  c('median1', 'mean1') := list(median(value1), 
    mean1= mean(value2)) , .(label=sub('_.*$', '', label))][, c(1,4:5), 
       with=FALSE]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Here is a sample: > tmp label value1 value2 1 aa_x_x xx xx 2

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply