I am trying to use plyr and approx to interpolate values for y for each year between the observed values.
Instead of just the 3 observations for each country,
I would like to have 11 observations – one for each year from 1985 and 1995.
Here is a sample data set
country <- c("country a", "country a", "country a",
"country b", "country b", "country b",
"country c", "country c", "country c")
year <- c(1985, 1990, 1995,
1985, 1990, 1995,
1985, 1990, 1995)
y <- c(10, 12, 16,
NA, 23, 20,
12, 16, NA)
data <- data.frame(cbind(country,year,y))
The data set looks like this:
country year y
1 country a 1985 10
2 country a 1990 12
3 country a 1995 16
4 country b 1985 <NA>
5 country b 1990 23
6 country b 1995 20
7 country c 1985 12
8 country c 1990 16
9 country c 1995 <NA>
I can get approx to work for a subset of the data with just one country
a <- subset(data, data$country == "country a")
interpolate y value for every year from 1985 to 1995
attach(a)
a.int <- approx(year,y, xout = 1985:1995, method = "linear")
But how do I use plyr to interpolate data for each country?
I’ve tried using dlply, but the output values are NA for each year
attach(data)
int <- dlply(data, .(country), function(i) approx(i$year, i$y, xout = 1985:1995,
method = "linear")$y )
How can I use plyr and approx together to interpolate values of y?
Also, once I get the correct aprrox output (which will be list) how do I reshape the data so that it is in the original long format? Ideally, the data would have 11 rows each country and one column with y values.
I would use
ddplyrather thandlplyfor this.However,
approxby default returns NA for values outside the min or max X supplied. see?approxfor the different methods for changing this.