Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7584079
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T18:50:07+00:00 2026-05-30T18:50:07+00:00

I am trying to calculated the lagged difference (or actual increase) for data that

  • 0

I am trying to calculated the lagged difference (or actual increase) for data that has been inadvertently aggregated. Each successive year in the data includes values from the previous year. A sample data set can be created with this code:

set.seed(1234)
x <- data.frame(id=1:5, value=sample(20:30, 5, replace=T), year=3)
y <- data.frame(id=1:5, value=sample(10:19, 5, replace=T), year=2)
z <- data.frame(id=1:5, value=sample(0:9, 5, replace=T), year=1)
(df <- rbind(x, y, z))

I can use a combination of lapply() and split() to calculate the difference between each year for every unique id, like so:

(diffs <- lapply(split(df, df$id), function(x){-diff(x$value)}))

However, because of the nature of the diff() function, there are no results for the values in year 1, which means that after I flatten the diffs list of lists with Reduce(), I cannot add the actual yearly increases back into the data frame, like so:

df$actual <- Reduce(c, diffs)  # flatten the list of lists

In this example, there are only 10 calculated differences or lags, while there are 15 rows in the data frame, so R throws an error when trying to add a new column.

How can I create a new column of actual increases with (1) the values for year 1 and (2) the calculated diffs/lags for all subsequent years?

This is the output I’m eventually looking for. My diffs list of lists calculates the actual values for years 2 and 3 just fine.

id value year actual
 1    21    3      5
 2    26    3     16
 3    26    3     14
 4    26    3     10
 5    29    3     14
 1    16    2     10
 2    10    2      5
 3    12    2     10
 4    16    2      7
 5    15    2     13
 1     6    1      6
 2     5    1      5
 3     2    1      2
 4     9    1      9
 5     2    1      2
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T18:50:09+00:00Added an answer on May 30, 2026 at 6:50 pm

    I think this will work for you. When you run into the diff problem just lengthen the vector by putting 0 in as the first number.

    df <- df[order(df$id, df$year), ]
    sdf <-split(df, df$id)
    df$actual <- as.vector(sapply(seq_along(sdf), function(x) diff(c(0, sdf[[x]][,2]))))
    df[order(as.numeric(rownames(df))),]
    

    There’s lots of ways to do this but this one is fairly fast and uses base.

    Here’s a second & third way of approaching this problem utilizing aggregate and by:

    aggregate:

    df <- df[order(df$id, df$year), ]
    diff2 <- function(x) diff(c(0, x))
    df$actual <- c(unlist(t(aggregate(value~id, df, diff2)[, -1])))
    df[order(as.numeric(rownames(df))),]
    

    by:

    df <- df[order(df$id, df$year), ]
    diff2 <- function(x) diff(c(0, x))
    df$actual <- unlist(by(df$value, df$id, diff2))
    df[order(as.numeric(rownames(df))),]
    

    plyr

    df <- df[order(df$id, df$year), ]
    df <- data.frame(temp=1:nrow(df), df)
    library(plyr)
    df <- ddply(df, .(id), transform, actual=diff2(value))
    df[order(-df$year, df$temp),][, -1]
    

    It gives you the final product of:

    > df[order(as.numeric(rownames(df))),]
       id value year actual
    1   1    21    3      5
    2   2    26    3     16
    3   3    26    3     14
    4   4    26    3     10
    5   5    29    3     14
    6   1    16    2     10
    7   2    10    2      5
    8   3    12    2     10
    9   4    16    2      7
    10  5    15    2     13
    11  1     6    1      6
    12  2     5    1      5
    13  3     2    1      2
    14  4     9    1      9
    15  5     2    1      2
    

    EDIT: Avoiding the Loop

    May I suggest avoiding the loop and turning what I gave to you into a function (the by solution is the easiest one for me to work with) and sapply that to the two columns you desire.

    set.seed(1234)  #make new data with another numeric column
    x <- data.frame(id=1:5, value=sample(20:30, 5, replace=T), year=3)
    y <- data.frame(id=1:5, value=sample(10:19, 5, replace=T), year=2)
    z <- data.frame(id=1:5, value=sample(0:9, 5, replace=T), year=1)
    df <- rbind(x, y, z)
    df <- df.rep <- data.frame(df[, 1:2], new.var=df[, 2]+sample(1:5, nrow(df), 
              replace=T), year=df[, 3])
    
    
    df <- df[order(df$id, df$year), ]
    diff2 <- function(x) diff(c(0, x))                   #function one
    group.diff<- function(x) unlist(by(x, df$id, diff2)) #answer turned function
    df <- data.frame(df, sapply(df[, 2:3], group.diff))  #apply group.diff to col 2:3
    df[order(as.numeric(rownames(df))),]                 #reorder it
    

    Of course you’d have to rename these unless you used transform as in:

    df <- df[order(df$id, df$year), ]
    diff2 <- function(x) diff(c(0, x))                   #function one
    group.diff<- function(x) unlist(by(x, df$id, diff2)) #answer turned function
    df <- transform(df, actual=group.diff(value), actual.new=group.diff(new.var))   
    df[order(as.numeric(rownames(df))),]
    

    This would depend on how many variables you were doing this to.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I've an Entity Framework model (v.1.0) that I'm trying to extends with a calculated
I'm trying to create a MySQL statement that will sort by a value calculated
I am trying to shade a sphere. I calculated the normals to each vertex
I am trying to write a data calculated from this function in a file.
I'm trying to accomplish a query that requires a calculated column using a subquery
I'm trying to figure out how to create additional calculated properties that are exposed
I have some calculated values in the core data database that I need to
we are trying to retrieve a calculated value from a cell which has add-In
I'm trying to write a stored procedure that will return two calculated values for
Trying to put in field pagar the calculated value of precio * 15% but

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.