Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7983985
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T11:07:31+00:00 2026-06-04T11:07:31+00:00

I have a dataframe showing a date, an item and a value, and I

  • 0

I have a dataframe showing a date, an item and a value, and I want to add a column showing the average of its 50 previous entries (or NA if it hasn’t had 50) e.g. the table could be

      data
date     item value  
01/01/01 a    2  
01/01/01 b    1.5  
04/01/01 c    1.7  
05/01/01 a    1.9  
......

and part of it could become

date     item value last_50_mean   
........ 
11/09/01 a    1.2   1.1638
12/09/01 b    1.9   1.5843 
12/09/01 a    1.4   1.1621
13/09/01 c    0.9   NA
........

So in this case the mean of a in the 50 entries before 11/09/01 is 1.1638 and c hasn’t had 50 entries before 13/09/01 so returns NA

I am currently doing this using the following function

  data[, 'last_50_mean'] <- sapply(1:nrow(data), function(i){
        prevDates <- data[data$date < data$date[i] & data$item == data$item[i], ]
        num       <- nrow(prevGames)
        if(nGames >= 50){
          round(mean(prevDates[(num- 49):num, ]$value), 4)
        }
      }
  )

But my dataframe is large and it is taking a long time (in fact I’m not 100% sure it works as it is still running… Does anyone know of the best way to do this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T11:07:33+00:00Added an answer on June 4, 2026 at 11:07 am

    The mean of N observations can be calculated from the cumulative sum and the difference between the first and last value, diff(cumsum(x), lag=N - 1). Your question wants the first N – 1 values to be padded, so

    meanN <- function(x, N=50)
        ## mean of last N observations, padded in front with NA
    {
        x0 <- x[seq_len(length(x) - N + 1)]
        x1 <- (x0 + diff(cumsum(x), lag=N-1)) / N
        c(rep(NA, N - 1), x1)
    }
    

    You’d like to do this for several groups. For a data.frame like

    df <- data.frame(item=sample(letters[1:3], 1000, TRUE),
                     value=runif(1000, 1, 3),
                     last_50_mean=NA)
    

    one way of doing this is

    split(df$last_50_mean, df$item) <- lapply(split(df$value, df$item), meanN)
    

    leading to for instance

    > tail(df)
         item    value last_50_mean
    995     c 1.191486     2.037707
    996     c 2.899214     2.073022
    997     c 2.019375     2.054914
    998     c 2.737043     2.066389
    999     a 1.703752     1.923234
    1000    c 1.602442     2.043517
    

    This assumes that your data frame is ordered by time. A potential problem is when long vectors overflow cumsum; one could address this by centering value so the expectation is that cumsum doesn’t stray too far from zero. A recent question addressed alternatives to split<- and dropping the last N observations.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a dataframe and would like to add a new column where the
I have a pandas DataFrame with a date column. It is not an index.
I have a dataframe with columns labeled A,B & C. I want to add
I have a dataframe called prices_df to which I add an extra column and
I have a dataframe with numeric entries like this one test <- data.frame(x =
I have a dataframe with a column of integers that I would like to
I have a dataframe with one column that I would like to split into
I'd like to melt the dataframe so that in one column I have dates
I have a dataframe with 10 rows df <- c(1:10) How do I add
I have a dataframe like this: block plot date data 1 1 aug 11.95171507

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.