I have a dataframe showing a date, an item and a value, and I

Question

0

Asked: June 4, 20262026-06-04T11:07:31+00:00 2026-06-04T11:07:31+00:00

I have a dataframe showing a date, an item and a value, and I

0

I have a dataframe showing a date, an item and a value, and I want to add a column showing the average of its 50 previous entries (or NA if it hasn’t had 50) e.g. the table could be

      data
date     item value  
01/01/01 a    2  
01/01/01 b    1.5  
04/01/01 c    1.7  
05/01/01 a    1.9  
......

and part of it could become

date     item value last_50_mean   
........ 
11/09/01 a    1.2   1.1638
12/09/01 b    1.9   1.5843 
12/09/01 a    1.4   1.1621
13/09/01 c    0.9   NA
........

So in this case the mean of a in the 50 entries before 11/09/01 is 1.1638 and c hasn’t had 50 entries before 13/09/01 so returns NA

I am currently doing this using the following function

  data[, 'last_50_mean'] <- sapply(1:nrow(data), function(i){
        prevDates <- data[data$date < data$date[i] & data$item == data$item[i], ]
        num       <- nrow(prevGames)
        if(nGames >= 50){
          round(mean(prevDates[(num- 49):num, ]$value), 4)
        }
      }
  )

But my dataframe is large and it is taking a long time (in fact I’m not 100% sure it works as it is still running… Does anyone know of the best way to do this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T11:07:33+00:00

The mean of N observations can be calculated from the cumulative sum and the difference between the first and last value, diff(cumsum(x), lag=N - 1). Your question wants the first N – 1 values to be padded, so

meanN <- function(x, N=50)
    ## mean of last N observations, padded in front with NA
{
    x0 <- x[seq_len(length(x) - N + 1)]
    x1 <- (x0 + diff(cumsum(x), lag=N-1)) / N
    c(rep(NA, N - 1), x1)
}

You’d like to do this for several groups. For a data.frame like

df <- data.frame(item=sample(letters[1:3], 1000, TRUE),
                 value=runif(1000, 1, 3),
                 last_50_mean=NA)

one way of doing this is

split(df$last_50_mean, df$item) <- lapply(split(df$value, df$item), meanN)

leading to for instance

> tail(df)
     item    value last_50_mean
995     c 1.191486     2.037707
996     c 2.899214     2.073022
997     c 2.019375     2.054914
998     c 2.737043     2.066389
999     a 1.703752     1.923234
1000    c 1.602442     2.043517

This assumes that your data frame is ordered by time. A potential problem is when long vectors overflow cumsum; one could address this by centering value so the expectation is that cumsum doesn’t stray too far from zero. A recent question addressed alternatives to split<- and dropping the last N observations.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a dataframe showing a date, an item and a value, and I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply