Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6964481
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T15:55:49+00:00 2026-05-27T15:55:49+00:00

I would like to aggregate a data.frame by an identifier variable called ensg .

  • 0

I would like to aggregate a data.frame by an identifier variable called ensg. The data frame looks like this:

  chromosome probeset               ensg symbol    XXA_00    XXA_36    XXB_00
1          X  4938842 ENSMUSG00000000003   Pbsn  4.796123  4.737717  5.326664

I want to compute the mean for each numeric column over rows with same ensg value. The problem here is that I would like to leave the other identity variables chromosome and symbol untouched as they are also the same for same ensg.

In the end I would like to have a data.frame with identity columns chromosome, ensg, symbol and mean of numeric columns over rows with same identifier. I implemented this in ddply, but it is very slow when compared to aggregate:

spec.mean <- function(eset.piece)
  {
    cbind(eset.piece[1,-numeric.columns],t(colMeans(eset.piece[,numeric.columns])))
  }
t
mean.eset <- ddply(eset.consensus.grand,.(ensg),spec.mean,.progress="tk")

My first aggregate implementation looks like this,

mean.eset=aggregate(eset[,numeric.columns], by=list(eset$ensg), FUN=mean, na.rm=TRUE);

and is much faster. But the problem with aggregate is that I have to reattach the describing variables. I have not figured out how to use my custom function with aggregate since aggregate does not pass data frames but only vectors.

Is there an elegant way to do this with aggregate? Or is there some faster way to do it with ddply?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T15:55:50+00:00Added an answer on May 27, 2026 at 3:55 pm

    First let’s define a toy example:

    df <- data.frame(chromosome = gl(3,  10,  labels = c('A',  'B',  'C')),
                 probeset = gl(3,  10,  labels = c('X',  'Y',  'Z')),
                 ensg =  gl(3,  10,  labels = c('E1',  'E2',  'E3')),
                 symbol = gl(3,  10,  labels = c('S1',  'S2',  'S3')),
                 XXA_00 = rnorm(30),
                 XXA_36 = rnorm(30),
                 XXB_00 = rnorm(30))
    

    And then we use aggregate with the formula interface:

    df1 <- aggregate(cbind(XXA_00, XXA_36, XXB_00) ~ ensg + chromosome + symbol,  
        data = df,  FUN = mean)
    
    > df1
      ensg chromosome symbol      XXA_00      XXA_36      XXB_00
    1   E1          A     S1 -0.02533499 -0.06150447 -0.01234508
    2   E2          B     S2 -0.25165987  0.02494902 -0.01116426
    3   E3          C     S3  0.09454154 -0.48468517 -0.25644569
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have objects in a tree structure, I would like to aggregate status information
i have a data frame with 2 groups 1 timevariable and an dependent variable.
I would like to aggregate our IIS logs and be able to quickly perform
I've built a content aggregator and would like to add a tag cloud representing
Would like to get a list of advantages and disadvantages of using Stored Procedures.
Would like to create a strong password in C++. Any suggestions? I assume it
Would like to be able to set colors of headings and such, different font
Would like to know what a programmer should know to become a good at
Would like to make anapplication in Java that will not automatically parse parameters used
Would like to know the c# code to actually retrieve the IP type: Static

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.