I am using R for some statistical analysis of time series. I have tried Googling around, but I can’t seem to find any definitive answers. Can any one who knows more please point me in the right direction?
Example:
Let’s say I want to do a linear regression of two time series. The time series contain daily data, but there might be gaps here and there so the time series are not regular. Naturally I only want to compare data points where both time series have data. This is what I do currently to read the csv files into a data frame:
library(zoo)
apples <- read.csv('/Data/apples.csv', as.is=TRUE)
oranges <- read.csv('/Data/oranges.csv', as.is=TRUE)
apples$date <- as.Date(apples$date, "%d/%m/%Y")
oranges$date <- as.Date(oranges$date, "%d/%m/%Y")
zapples <- zoo(apples$close,apples$date)
zoranges <- zoo(oranges$close,oranges$date)
zdata <- merge(zapples, zoranges, all=FALSE)
data <- as.data.frame(zdata)
Is there a slicker way of doing this?
Also, how can I slice the data, e.g., select the entries in data with dates within a certain period?
Try something along these lines. This assumes that the dates are in column 1. The dyn package can be used to transform
lm,glmand many similar regression type functions to ones that accept zoo series. Writedyn$lmin place oflmas shown:You don’t need
all = FALSEsincelmwill ignore rows with NAs under the default setting of itsna.actionargument.The
window.zoofunction can be used to slice data.Depending on what you want to do you might also want to look at the xts and quantmod packages.