I have the following column in my data frame:
DateTime
1 2011-10-03 08:00:04
2 2011-10-03 08:00:05
3 2011-10-03 08:00:06
4 2011-10-03 08:00:09
5 2011-10-03 08:00:15
6 2011-10-03 08:00:24
7 2011-10-03 08:00:30
8 2011-10-03 08:00:42
9 2011-10-03 08:01:01
10 2011-10-03 08:01:24
11 2011-10-03 08:01:58
12 2011-10-03 08:02:34
13 2011-10-03 08:03:25
14 2011-10-03 08:04:26
15 2011-10-03 08:06:00
With dput:
> dput(smallDF)
structure(list(DateTime = structure(c(1317621604, 1317621605,
1317621606, 1317621609, 1317621615, 1317621624, 1317621630, 1317621642,
1317621661, 1317621684, 1317621718, 1317621754, 1317621805, 1317621866,
1317621960, 1317622103, 1317622197, 1317622356, 1317622387, 1317622463,
1317622681, 1317622851, 1317623061, 1317623285, 1317623404, 1317623498,
1317623612, 1317623849, 1317623916, 1317623994, 1317624174, 1317624414,
1317624484, 1317624607, 1317624848, 1317625023, 1317625103, 1317625179,
1317625200, 1317625209, 1317625229, 1317625238, 1317625249, 1317625264,
1317625282, 1317625300, 1317625315, 1317625339, 1317625353, 1317625365,
1317625371, 1317625381, 1317625395, 1317625415, 1317625423, 1317625438,
1317625458, 1317625469, 1317625487, 1317625500, 1317625513, 1317625533,
1317625548, 1317625565, 1317625581, 1317625598, 1317625613, 1317625640,
1317625661, 1317625674, 1317625702, 1317625715, 1317625737, 1317625758,
1317625784, 1317625811, 1317625826, 1317625841, 1317625862, 1317625895,
1317625909, 1317625935, 1317625956, 1317625973, 1317626001, 1317626043,
1317626062, 1317626100, 1317626113, 1317626132, 1317626153, 1317626179,
1317626212, 1317626239, 1317626271, 1317626296, 1317626323, 1317626361,
1317626384, 1317626407), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = "DateTime", row.names = c(NA,
-100L), class = "data.frame")
My goal: I want to calculate the time difference, in seconds, between each measurement.
Edit:
I’m looking to get the following result, where the time difference (in seconds) between each data point is calculated, except for the first value of the day (line 3), when the time is calculate relative to 8 am:
DateTime Seconds
1 2011-09-30 21:59:02 6
2 2011-09-30 21:59:04 2
3 2011-10-03 08:00:04 4
4 2011-10-03 08:00:05 1
5 2011-10-03 08:00:06 1
6 2011-10-03 08:00:09 3
7 2011-10-03 08:00:15 5
8 2011-10-03 08:00:24 9
9 2011-10-03 08:00:30 6
10 2011-10-03 08:00:42 12
11 2011-10-03 08:01:01 19
12 2011-10-03 08:01:24 23
13 2011-10-03 08:01:58 34
14 2011-10-03 08:02:34 36
15 2011-10-03 08:03:25 51
16 2011-10-03 08:04:26 61
17 2011-10-03 08:06:00 94
However, the measurements start at 8:00 am, so if the value is the first of the day, the number of seconds relative to 8:00 am need to be calculated. In the example above, the first measurement ends at 8:00:04 so using the $sec attribute of POSIX could work here, but on other days the first value may happen a few minutes after 8:00 o’clock.
I’ve tried to achieve that goal with the following function:
SecondsInBar <- function(x, startTime){
# First data point or first of day
if (x == 1 || x > 1 && x$wkday != x[-1]$wkday){
seconds <- as.numeric(difftime(x,
as.POSIXlt(startTime, format = "%H:%M:%S"),
units = "secs"))
# else calculate time difference
} else {
seconds <- as.numeric(difftime(x, x[-1], units = "secs"))
}
return (seconds)
}
Which then could be called with SecondsInBar(smallDF$DateTime, "08:00:00").
There are at least two problems with this function, but I don’t know how to solve these:
- The code segment
x$wkday != x[-1]$wkdayreturns a$ operator iserror,
invalid for atomic vectors - And the
as.POSIXlt(startTime, format = "%H:%M:%S")uses the
current date, which makes thedifftimecalculation erroneous.
My question:
Where am I going wrong with this function?
And: is this approach a viable way or should I approach it from a different angle?
How about something along these lines:
Note that I used 7am as when I copied your data my it decided to interpret it as BST.
As for your errors, you can’t use
$to get elements of a date withPOSIXct(which is howsmallDF$DateTimeis defined), only withPOSIXlt. And for the second error, if you don’t supply a date, it has to assume the current date, as there is no other information to draw upon.Edit
Now its been clarified, I would propose a different approach:
splityour data frame by day, and thencombine the times with the reference time and dodiffon that, usinglapplyto loop over days: