I have a data frame with 4 columns… Date, Hour, Loc, Value.
What I’d like to do with the data is come up with summary statistic for each unique Date/hour/loc. This seems to be easy since I can do
x <- subset(my.df[,4],
my.df[,2]==(some parameter) & my.df[,3]==(another parameter)
)
and then get whatever summary statistics I want from x. However the tricky part is that I also want to get summary statistics from each of the differences of the aforementioned values. So for instance I want to take the difference of value when loc=1 from value when loc=2 with hour=1 but there may or may not be missing days within either of the locs. One idea I had which probably will work is to reshape my.df to be wider twice. Firstly make it wide with timevar=loc and then reshape that with timevar=hour so that I’ll have wide.df with columns Date, value.1.1, value.1.2 etc where the first integer is the loc and the second integer is the hour and each row is a unique date.
Is there a more straight forward to do this that won’t involve 20 minutes of reshaping (the initial df is about 9493401 rows with 4 variables and then I stretch it out to 720 rows with 14857 columns?
@Brandon: Here’s the str output. I haven’t tried your suggestions yet though.
'data.frame': 9493401 obs. of 4 variables:
$ Loc : int 1 1 1 1 1 1 1 1 1 1 ...
$ Date: POSIXct, format: "2010-10-29" "2010-10-29" ...
$ Hour : int 1 2 3 4 5 6 7 8 9 10 ...
$ Value : num 7.63 4.07 4.9 1.61 0.34 -5.23 2.11 2.39 7.2 4.41 ...
There’s
dcastfromreshape2which seems to be pretty snappy in this regard:You can also do things like counts of loc/hour
You’ll need to provide more information if you want an answer that’s specific to your case.