Possible Duplicate:
Beginner tips on using plyr to calculate year-over-year change across groups
What is a good way to calcualte a year-on-year difference (new variable) for an existing data frame variable (i.e. sales) across multiple variable groups (i.e. Region and Food)?
Below is a example of the data frame structure:
Date Region Type Sales
1/1/2001 East Food 120
1/1/2001 West Housing 130
1/1/2001 North Food 130
1/2/2001 East Food 133
1/3/2001 West Housing 140
1/4/2001 North Food 150
….
….
1/29/2013 East Food 125
1/29/2013 West Housing 137
1/29/2013 North Food 1350
Also, in addition to differening the data, I would like to calcuate a a trailing (say 7 day) moving average.
Any guidance would be greatly appreciated.
Here is something to get you started.
data.tableis a great package for this sort of things as it provides a concise and easy-to-use syntax (once you are past the learning curve) for these kinds of things.Create a reproducible example
Add Year column
Calculate summary by year
Function to create year-on-year diffs for each region/product combo.
Calculate year-on-year difference by column. This works for my example as setkey(dt, Year) sorts the data table by Year, but if your example misses some years for some products/regions you have to be more careful.