R: how can I populate the rows of a data frame, in which each row represents a day, with a single common value for each year?
I have a data frame consisting of a date column, a price column and then various other columns derived from those two columns. One of the columns calculates, for each day in a given year, the percentage change in the price from the beginning of that year (this is related to an earlier question).
I want to add a column that holds, for each day of a given year, the percentage change in the price for the whole of that year. So, if the price rose 10% from the first to the last day of 2009, the column for all the days of 2009 should hold the value 10% (or 0.1). If the price fell by 2% between the first and last days of 2010, the column for each day in 2010 should hold the value -0.02 and so on.
The code I have so far is:
require(lubridate)
require(plyr)
# generate data
set.seed(12345)
df <- data.frame(date=seq(as.Date("2009/1/1"), by="day", length.out=1115),price=runif(1115, min=100, max=200))
# remove weekend days
df <- df[!(weekdays(as.Date(df$date)) %in% c('Saturday','Sunday')),]
# add some columns for later
df$year <- as.numeric(format(as.Date(df$date), format="%Y"))
df$month <- as.numeric(format(as.Date(df$date), format="%m"))
df$day <- as.numeric(format(as.Date(df$date), format="%d"))
df$daythisyear <- as.numeric(format(as.Date(df$date), format="%j"))
df <- transform(df, doy = as.Date(paste(2000, month, day, sep="/")))
df <- ddply(df, .(year), transform, pctchg = ((price/price[1])-1))
I realise that I can get the annual (year-on-year) change by using another data frame, something like this:
df.yr <- ddply(df, .(year), function(x) (x[nrow(x),2]/x[1,2])-1)
…but I can’t work out how to add the figures for the years to a column in an existing data frame, particularly given that (if you are working with 4 years of data) there are only 4 rows, one for each year, compared to about 800 in the data frame of daily data used to derive those 4 rows – you get a mismatch.
It is straightforward to use a for loop starting at the last row of the data frame and moving back up the daythisyear column to achieve this (if daythisyear on current row is larger than daythisyear on the row below, you have a change in year, so take new value from that row to use in the column being added etc). Nevertheless, I feel sure there must be a more R-colloquial approach using an apply function or ddply, which I have so far studiously avoided tackling. So my question is:
Q. How do I calculate the annual change in the value of a column and then insert that value, as a new column, into every row for that year?
I’ve not yet converted to being a ddply user, preferring instead to use
avewhen it is the obvious solution. I suspect that this code would translate across: