Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6607571
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T19:31:03+00:00 2026-05-25T19:31:03+00:00

There are two numeric columns in a data file. I need to calculate the

  • 0

There are two numeric columns in a data file. I need to calculate the average of the second column by intervals (such as 100) of the first column.

I can program this task in R, but my R code is really slow for a relatively large data file (millions of rows, with the value of first column changing between 1 to 33132539).

Here I show my R code. How could I tune it to be faster? Other solutions that are perl, python, awk or shell based are appreciated.

Thanks in advance.

(1) my data file (tab-delimited, millions of rows)

5380    30.07383\n
5390    30.87\n
5393    0.07383\n
5404    6\n
5428    30.07383\n
5437    1\n
5440    9\n
5443    30.07383\n
5459    6\n
5463    30.07383\n
5480    7\n
5521    30.07383\n
5538    0\n
5584    20\n
5673    30.07383\n
5720    30.07383\n
5841    3\n
5880    30.07383\n
5913    4\n
5958    30.07383\n

(2) what I want to get, here interval = 100

intervals_of_first_columns, average_of_2nd column_by_the_interval
100, 0\n
200, 0\n
300, 20.34074\n
400, 14.90325\n
.....

(3) R code

chr1 <- 33132539 # set the limit for the interval
window <- 100 # set the size of interval

spe <- read.table("my_data_file", header=F) # read my data in
names(spe) <- c("pos", "rho") # name my data 

interval.chr1 <- data.frame(pos=seq(0, chr1, window)) # setup intervals
meanrho.chr1 <- NULL # object for the mean I want to get

# real calculation, really slow on my own data.
for(i in 1:nrow(interval.chr1)){
  count.sub<-subset(spe, chrom==1 & pos>=interval.chr1$pos[i] & pos<=interval.chr1$pos[i+1])
  meanrho.chr1[i]<-mean(count.sub$rho)
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T19:31:04+00:00Added an answer on May 25, 2026 at 7:31 pm

    You don’t really need to set up an output data.frame but you can if you want. Here is how I would have coded it, and I guarantee it will be fast.

    > dat$incrmt <- dat$V1 %/% 100
    > dat
         V1       V2 incrmt
    1  5380 30.07383     53
    2  5390 30.87000     53
    3  5393  0.07383     53
    4  5404  6.00000     54
    5  5428 30.07383     54
    6  5437  1.00000     54
    7  5440  9.00000     54
    8  5443 30.07383     54
    9  5459  6.00000     54
    10 5463 30.07383     54
    11 5480  7.00000     54
    12 5521 30.07383     55
    13 5538  0.00000     55
    14 5584 20.00000     55
    15 5673 30.07383     56
    16 5720 30.07383     57
    17 5841  3.00000     58
    18 5880 30.07383     58
    19 5913  4.00000     59
    20 5958 30.07383     59
    
    > with(dat, tapply(V2, incrmt, mean, na.rm=TRUE))
          53       54       55       56       57       58       59 
    20.33922 14.90269 16.69128 30.07383 30.07383 16.53692 17.03692 
    

    You could have done even less setup (skip the incrmt variable with this code:

        > with(dat, tapply(V2, V1 %/% 100, mean, na.rm=TRUE))
          53       54       55       56       57       58       59 
    20.33922 14.90269 16.69128 30.07383 30.07383 16.53692 17.03692 
    

    And if you want the result to be available for something:

    by100MeanV2 <- with(dat, tapply(V2, V1 %/% 100, mean, na.rm=TRUE))
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Just learning R. Given a data.frame in R with two columns, one numeric and
I have a data.frame with 20 columns. The first two are factors, and the
I have a data frame with two columns: an ID and some numeric value
There are two popular closure styles in javascript. The first I call anonymous constructor
There are two scenarios I need to clarify: An executable compiled with .NET 3.5
When converting a data frame with mixed factor and numeric columns to an xts,
I have a table that has two columns both of them are continuous data.
I have a table with two columns [id, value] both numeric. In this example:
I need a RegEx for a numeric value with up to two decimal places
There are two weird operators in C#: the true operator the false operator If

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.