Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8087755
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T18:52:55+00:00 2026-06-05T18:52:55+00:00

I have a data frame that is some 35,000 rows, by 7 columns. it

  • 0

I have a data frame that is some 35,000 rows, by 7 columns. it looks like this:

head(nuc)

  chr feature    start      end   gene_id    pctAT    pctGC length
1   1     CDS 67000042 67000051 NM_032291 0.600000 0.400000     10
2   1     CDS 67091530 67091593 NM_032291 0.609375 0.390625     64
3   1     CDS 67098753 67098777 NM_032291 0.600000 0.400000     25
4   1     CDS 67101627 67101698 NM_032291 0.472222 0.527778     72
5   1     CDS 67105460 67105516 NM_032291 0.631579 0.368421     57
6   1     CDS 67108493 67108547 NM_032291 0.436364 0.563636     55

gene_id is a factor, that has about 3,500 unique levels. I want to, for each level of gene_id get the min(start), max(end), mean(pctAT), mean(pctGC), and sum(length).

I tried using lapply and do.call for this, but it’s taking forever +30 minutes to run.
the code I’m using is:

nuc_prof = lapply(levels(nuc$gene_id), function(gene){
  t = nuc[nuc$gene_id==gene, ]
  return(list(gene_id=gene, start=min(t$start), end=max(t$end), pctGC =
              mean(t$pctGC), pct = mean(t$pctAT), cdslength = sum(t$length))) 
})
nuc_prof = do.call(rbind, nuc_prof)

I’m certain I’m doing something wrong to slow this down. I haven’t waited for it to finish as I’m sure it can be faster. Any ideas?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T18:52:56+00:00Added an answer on June 5, 2026 at 6:52 pm

    Since I’m in an evangelizing mood … here’s what the fast data.table solution would look like:

    library(data.table)
    dt <- data.table(nuc, key="gene_id")
    
    dt[,list(A=min(start),
             B=max(end),
             C=mean(pctAT),
             D=mean(pctGC),
             E=sum(length)), by=key(dt)]
    #      gene_id        A        B         C         D   E
    # 1: NM_032291 67000042 67108547 0.5582567 0.4417433 283
    # 2:       ZZZ 67000042 67108547 0.5582567 0.4417433 283
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a data.frame that looks like this: > head(ff.df) .id pio caremgmt prev
I have a data.frame that looks like this: ID Date.A Date.B Variable A 01/01/2012
I have a data.frame in R that looks like this: score rms template aln_id
I have a data frame in R that looks like this: > TimeOffset, Source,
Let's say I have some data in R that looks like this: c(0.11, NA,
I have the following data frame (info) that looks like this: > info[1:5,] field
I have a data set that looks like this: ByYear <- data.frame( V1 =
Suppose that you have a data frame with many rows and many columns. The
Given that you have a data frame with a lot of columns and rows,
I have a data frame that looks like: > ta ranks omp ALLA1 1

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.