I have been struggling with this for a while. I am new to working with ts data and all related R packages.
I have a df with several variables including what ‘time of day’in GMT “%H%M” and date “%Y/%m/%e” sampling occurred. I want to bin/aggregate my date data into “weeks” (i.e., %W/%g) and calculate the mean ‘time of the day’ when sampling occurred during that week.
I was able to calculate other FUN on numerical variables (e.g., weight) by first transforming my df into a zoo object and then using aggregate.zoo command as follow:
#calculate the sum weight captured every week
x2c <- aggregate(OA_zoo, as.Date(cut(time(OA_zoo), "week")), sum)
However, I am not sure how to get around the fact that I am working with Date format rather than num and would appreciate any tips!
Also, I have obviously been coding way to much by doing each of my variables separately. Would there be a way of applying different FUN (sum/mean/max/min) on my df by aggregating “weekly” using plyr? Or some other packages?
EDITS/CLARIFICATIONS
Here’s the dput output of a sample of my full dataset. I have data from 2004-2011. What I would like to look at/plot using ggplot2 is the mean/median of TIME (%H%M) aggregated in period of weeks over time (2004-2011). Right now, my data is not aggregated in week, but is daily (random sample).
> dput(godin)
structure(list(depth = c(878, 1200, 1170, 936, 942, 964, 951,
953, 911, 969, 960, 987, 991, 997, 1024, 978, 1024, 951, 984,
931, 1006, 929, 973, 986, 935, 989, 1042, 1015, 914, 984), duration = c(0.8,
2.6, 6.5, 3.2, 4.1, 6.4, 7.2, 5.3, 7.4, 7, 7, 5.5, 7.5, 7.3,
7.5, 7, 4.2, 3, 5, 5, 9.3, 7.9, 7.3, 7.2, 7, 5.2, 8, 6, 7.5,
7), Greenland = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 40L, 28L, 0L,
0L, 34L, 7L, 28L, 0L, 0L, 0L, 27L, 0L, 0L, 0L, 44L, 59L, 0L,
0L, 0L, 0L, 0L, 0L), date2 = structure(c(12617, 12627, 12631,
12996, 12669, 13036, 12669, 13036, 12670, 13036, 12670, 13037,
12671, 13037, 12671, 13037, 12671, 13038, 12672, 13038, 12672,
13038, 12672, 13039, 12631, 12997, 12673, 13039, 12673, 13039
), class = "Date"), TIME = c("0940", "0145", "0945", "2045",
"1615", "0310", "2130", "1045", "0625", "1830", "1520", "0630",
"0035", "1330", "0930", "2215", "2010", "0645", "0155", "1205",
"0815", "1845", "2115", "0350", "1745", "0410", "0550", "1345",
"1515", "2115")), .Names = c("depth", "duration", "Greenland",
"date2", "TIME"), class = "data.frame", row.names = c("6761",
"9019", "9020", "9021", "9022", "9023", "9024", "9025", "9026",
"9027", "9028", "9029", "9030", "9031", "9032", "9033", "9034",
"9035", "9036", "9037", "9038", "9039", "9040", "9041", "9042",
"9043", "9044", "9045", "9046", "9047"))
I’d approach it like this:
first make a column with a string representing the week:
this will give you something like
"2004-W26", which will be good enough foraggregate.then you need to turn your character vector that represents HHMM into an actual time, so that you can use time math on it.
NOTE: the above is a bit of a hack…
strptime()assumes the current date if none is specified, but that shouldn’t get in the way of this particular application, since all converted times will have the same date, the time part of the mean will be correct. I’ll strip off the date later…At that point, I think you can simply aggregate:
and get rid of the irrelevant (and erroneous) date part
et Voila.
The lesson here is that its tricky to push around times with no associated dates in R. I’d love to hear from others who have a better way of doing this.