I am working on a dataframe which I have previously intergrated the time and date into one column (called timestamp):
a <-c(1:21)
D <- c("2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14", "2012/12/14")
Time <- c("18:40:37", "18:40:48", "18:40:58", "18:41:08","18:41:18","18:41:28","18:41:38","18:41:48","18:41:58","18:42:08","18:42:18","18:42:28","18:42:38","18:42:48","18:42:58","18:43:08","18:43:18","18:42:28", "18:44:18", "18:44:28", "18:44:28")
df1 <- data.frame(a, D, Time)
df1 <- within(df1, { timestamp=format(as.POSIXct(paste(D, Time)), "%d/%m/%Y %H:%M:%S") })
How would I subset the dataframe to exclude values after a specific point in time? I found some code in Stackoverflow for a similar question that I thought might be able to help, but I am struggling to get the time element to work:
subset(df1, format.Date(timestamp, ""%d/%m/%Y %H:%M:%S"") >"14/12/2012 18:42:00")
Any advice would be very much appreciated.
Edit:
I am struggling to get the code detailed below working on my real data. A dput() of the first four rows of my dataframe are listed at the end of this post. I previously used the line of code recommended by @Arun to timestamp my data.
gps <- within(gps, { timestamp=format(as.POSIXct(paste(LOCAL.DATE, LOCAL.TIME)),
+ "%d/%m/%Y %H:%M:%S") })
If I try and apply the second part of the code (strptime…) I get the error message:
Error in $<-.data.frame(*tmp*, “timestamp”, value = list(sec = c(37, :
replacement has 30208 rows, data has 4
This sort of explains when I try and appy the code to the whole of my data I get 8 rows of many numbers, separated by a comma. If you can help me in any way, I would be extremely grateful.
structure(list(timestamp = c("14/12/2012 18:40:37", "14/12/2012 18:40:48",
"14/12/2012 18:40:58", "14/12/2012 18:41:08"), LATITUDE = c(54.77769505,
54.77765729, 54.77768751, 54.7777021), LONGITUDE = c(-1.56627049,
-1.56639255, -1.56626555, -1.56662523), HEIGHT = c(" 173.911 M",
" 161.742 M", " 146.905 M", " 138.016 M"), SPEED = c(" 0.465 km/h",
" 0.728 km/h", " 4.574 km/h", " 17.335 km/h")), .Names = c("timestamp",
"LATITUDE", "LONGITUDE", "HEIGHT", "SPEED"), row.names = c(NA,
4L), class = "data.frame")
Second edit: Many thanks @Arun for the solution. I was a bit confused how I was suppose to use the code as my data is originally in date and time columns (LOCAL.DATE and LOCAL.TIME). So I used the first line of code from your orginal solution, and then the second line from your revised edits.
This is the code I used:
gps <- within(gps, { timestamp=format(as.POSIXct(paste(LOCAL.DATE, LOCAL.TIME)),
"%d/%m/%Y %H:%M:%S") })
gps$timestamp <- strptime(gps$timestamp, "%Y-%m-%d %H:%M:%S")
However now I get a string of NAs (and some -1s). Apologies if I used the code in the incorrect way…
Third edit
Apologies for the confusion @Arun. When I try it both ways round for the date column, I get errors. If I keep it as yr/m/d, how the original data was formatted, I get dput() of :
structure(list(timestamp = c("2012/12/14 18:40:37", "2012/12/14 18:40:48",
"2012/12/14 18:40:58", "2012/12/14 18:41:08"), LATITUDE = c(54.77769505,
54.77765729, 54.77768751, 54.7777021), LONGITUDE = c(-1.56627049,
-1.56639255, -1.56626555, -1.56662523), HEIGHT = c(" 173.911 M",
" 161.742 M", " 146.905 M", " 138.016 M"), SPEED = c(" 0.465 km/h",
" 0.728 km/h", " 4.574 km/h", " 17.335 km/h")), .Names = c("timestamp",
"LATITUDE", "LONGITUDE", "HEIGHT", "SPEED"), row.names = c(NA,
4L), class = "data.frame")
If I then use:
gps2$timestamp <- strptime(gps2$timestamp, "%Y/%m/%d %H:%M:%S")
… and try to view the dataframe in R Studio’s workspace window, the R session aborts.
Its better to load character vectors as such and not factors using
stringsAsFactors = FALSE(as shown below)Then,
Now, try subset this way:
Edit: Try this: