I have a data frame with 4 columns… Date, Hour, Loc, Value. What I’d

Question

0

Asked: June 14, 20262026-06-14T22:44:43+00:00 2026-06-14T22:44:43+00:00

I have a data frame with 4 columns… Date, Hour, Loc, Value. What I’d

0

I have a data frame with 4 columns… Date, Hour, Loc, Value.

What I’d like to do with the data is come up with summary statistic for each unique Date/hour/loc. This seems to be easy since I can do

x <- subset(my.df[,4], 
            my.df[,2]==(some parameter) & my.df[,3]==(another parameter)
           )

and then get whatever summary statistics I want from x. However the tricky part is that I also want to get summary statistics from each of the differences of the aforementioned values. So for instance I want to take the difference of value when loc=1 from value when loc=2 with hour=1 but there may or may not be missing days within either of the locs. One idea I had which probably will work is to reshape my.df to be wider twice. Firstly make it wide with timevar=loc and then reshape that with timevar=hour so that I’ll have wide.df with columns Date, value.1.1, value.1.2 etc where the first integer is the loc and the second integer is the hour and each row is a unique date.

Is there a more straight forward to do this that won’t involve 20 minutes of reshaping (the initial df is about 9493401 rows with 4 variables and then I stretch it out to 720 rows with 14857 columns?
@Brandon: Here’s the str output. I haven’t tried your suggestions yet though.

        'data.frame':   9493401 obs. of  4 variables:
    $ Loc  : int  1 1 1 1 1 1 1 1 1 1 ...
    $ Date: POSIXct, format: "2010-10-29" "2010-10-29" ...
    $ Hour     : int  1 2 3 4 5 6 7 8 9 10 ...
    $ Value   : num  7.63 4.07 4.9 1.61 0.34 -5.23 2.11 2.39 7.2 4.41 ...

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T22:44:44+00:00

There’s dcast from reshape2 which seems to be pretty snappy in this regard:

library(reshape2)
dat <- data.frame(date=sample(1:100,9493401,replace=TRUE),
                  hour=rep(1:24,1000000)[1:9493401],
                  loc=rep(letters[1:9],1054823)[1:9493401],
                  value=rnorm(9493401))     

dcast(dat,date + hour ~ loc)

You can also do things like counts of loc/hour

dcast(dat, date + hour ~ loc*hour)

You’ll need to provide more information if you want an answer that’s specific to your case.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a data frame with 4 columns… Date, Hour, Loc, Value. What I’d

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply