I am trying to calculated the lagged difference (or actual increase) for data that

Question

0

Asked: May 30, 20262026-05-30T18:50:07+00:00 2026-05-30T18:50:07+00:00

I am trying to calculated the lagged difference (or actual increase) for data that

0

I am trying to calculated the lagged difference (or actual increase) for data that has been inadvertently aggregated. Each successive year in the data includes values from the previous year. A sample data set can be created with this code:

set.seed(1234)
x <- data.frame(id=1:5, value=sample(20:30, 5, replace=T), year=3)
y <- data.frame(id=1:5, value=sample(10:19, 5, replace=T), year=2)
z <- data.frame(id=1:5, value=sample(0:9, 5, replace=T), year=1)
(df <- rbind(x, y, z))

I can use a combination of lapply() and split() to calculate the difference between each year for every unique id, like so:

(diffs <- lapply(split(df, df$id), function(x){-diff(x$value)}))

However, because of the nature of the diff() function, there are no results for the values in year 1, which means that after I flatten the diffs list of lists with Reduce(), I cannot add the actual yearly increases back into the data frame, like so:

df$actual <- Reduce(c, diffs)  # flatten the list of lists

In this example, there are only 10 calculated differences or lags, while there are 15 rows in the data frame, so R throws an error when trying to add a new column.

How can I create a new column of actual increases with (1) the values for year 1 and (2) the calculated diffs/lags for all subsequent years?

This is the output I’m eventually looking for. My diffs list of lists calculates the actual values for years 2 and 3 just fine.

id value year actual
 1    21    3      5
 2    26    3     16
 3    26    3     14
 4    26    3     10
 5    29    3     14
 1    16    2     10
 2    10    2      5
 3    12    2     10
 4    16    2      7
 5    15    2     13
 1     6    1      6
 2     5    1      5
 3     2    1      2
 4     9    1      9
 5     2    1      2

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T18:50:09+00:00

I think this will work for you. When you run into the diff problem just lengthen the vector by putting 0 in as the first number.

df <- df[order(df$id, df$year), ]
sdf <-split(df, df$id)
df$actual <- as.vector(sapply(seq_along(sdf), function(x) diff(c(0, sdf[[x]][,2]))))
df[order(as.numeric(rownames(df))),]

There’s lots of ways to do this but this one is fairly fast and uses base.

Here’s a second & third way of approaching this problem utilizing aggregate and by:

aggregate:

df <- df[order(df$id, df$year), ]
diff2 <- function(x) diff(c(0, x))
df$actual <- c(unlist(t(aggregate(value~id, df, diff2)[, -1])))
df[order(as.numeric(rownames(df))),]

by:

df <- df[order(df$id, df$year), ]
diff2 <- function(x) diff(c(0, x))
df$actual <- unlist(by(df$value, df$id, diff2))
df[order(as.numeric(rownames(df))),]

plyr

df <- df[order(df$id, df$year), ]
df <- data.frame(temp=1:nrow(df), df)
library(plyr)
df <- ddply(df, .(id), transform, actual=diff2(value))
df[order(-df$year, df$temp),][, -1]

It gives you the final product of:

> df[order(as.numeric(rownames(df))),]
   id value year actual
1   1    21    3      5
2   2    26    3     16
3   3    26    3     14
4   4    26    3     10
5   5    29    3     14
6   1    16    2     10
7   2    10    2      5
8   3    12    2     10
9   4    16    2      7
10  5    15    2     13
11  1     6    1      6
12  2     5    1      5
13  3     2    1      2
14  4     9    1      9
15  5     2    1      2

EDIT: Avoiding the Loop

May I suggest avoiding the loop and turning what I gave to you into a function (the by solution is the easiest one for me to work with) and sapply that to the two columns you desire.

set.seed(1234)  #make new data with another numeric column
x <- data.frame(id=1:5, value=sample(20:30, 5, replace=T), year=3)
y <- data.frame(id=1:5, value=sample(10:19, 5, replace=T), year=2)
z <- data.frame(id=1:5, value=sample(0:9, 5, replace=T), year=1)
df <- rbind(x, y, z)
df <- df.rep <- data.frame(df[, 1:2], new.var=df[, 2]+sample(1:5, nrow(df), 
          replace=T), year=df[, 3])


df <- df[order(df$id, df$year), ]
diff2 <- function(x) diff(c(0, x))                   #function one
group.diff<- function(x) unlist(by(x, df$id, diff2)) #answer turned function
df <- data.frame(df, sapply(df[, 2:3], group.diff))  #apply group.diff to col 2:3
df[order(as.numeric(rownames(df))),]                 #reorder it

Of course you’d have to rename these unless you used transform as in:

df <- df[order(df$id, df$year), ]
diff2 <- function(x) diff(c(0, x))                   #function one
group.diff<- function(x) unlist(by(x, df$id, diff2)) #answer turned function
df <- transform(df, actual=group.diff(value), actual.new=group.diff(new.var))   
df[order(as.numeric(rownames(df))),]

This would depend on how many variables you were doing this to.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to calculated the lagged difference (or actual increase) for data that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply