I have a dataframe with three columns: Id, Date and Value and want to downsample this by average: take the next 20 rows, build average of Value from these 20 rows and add it to a new dataframe with the same structure. Date should be the first value of the 20 rows.
I tried it this way (probably horrible :):
resample.downsample <- function(data, by=20)
{
i <- 0
nmax <- nrow(data)
means <- c()
while(i < nmax)
{
means <- c(means, mean(subset(data, Id > i & Id <= i+by)$Value))
i <- i+by
}
return (
data.frame(
Id = seq(1, length.out=(nmax/by), by=1),
Date = seq(startDate, length.out=(nmax/by), by=(1/by)),
Value = means
)
)
}
This works so for small datasets, but runs forever on my real datasets (~4000000 rows). Any ideas how to optimize this function?
Sample-Data (input, output should have the same structure, classes: integer, numeric, POSIXct/POSIXt):
Value Id Date
1 125 1 2011-06-30 22:41:50
2 127 2 2011-06-30 22:41:50
3 126 3 2011-06-30 22:41:50
4 123 4 2011-06-30 22:41:50
5 130 5 2011-06-30 22:41:50
6 131 6 2011-06-30 22:41:50
7 128 7 2011-06-30 22:41:50
See this answer for a method that should work for you. How to get the sum of each four rows of a matrix in R. In your case it would be:
Your current method to get the first Date should be fine.