As the title goes. Why is the lubridate function so much slower?
library(lubridate)
library(microbenchmark)
Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 50000, replace = TRUE)
microbenchmark(as.POSIXct(Dates, format = "%d-%b-%Y %H:%M:%S", tz = "GMT"), times = 100)
microbenchmark(dmy(Dates, tz ="GMT"), times = 100)
Unit: milliseconds
expr min lq median uq max
1 as.POSIXct(Dates, format = "%d-%b-%Y %H:%M:%S", tz = "GMT") 103.1902 104.3247 108.675 109.2632 149.871
2 dmy(Dates, tz = "GMT") 184.4871 194.1504 197.8422 214.3771 268.4911
For the same reason cars are slow in comparison to riding on top of rockets. The added ease of use and safety make cars much slower than a rocket but you’re less likely to get blown up and it’s easier to start, steer, and brake a car. However, in the right situation (e.g., I need to get to the moon) the rocket is the right tool for the job. Now if someone invented a car with a rocket strapped to the roof we’d have something.
Start with looking at what
dmyis doing and you’ll see the difference for the speed (by the way from your bechmarks I wouldn’t say thatlubridateis that much slower as these are in milliseconds):dmy#type this into the command line and you get:Right away I see
parse_dateandnum_to_dateandmake_format. Makes one wonder what all these guys are. Let’s see:parse_datenum_to_datemake_formatWow we got
strsplit-ting,expand-ing.grid-s,paste-ing,ifelse-ing,unname-ingetc. plus a Whole Lotta Error Checking Going On (play on the Zep song). So what we have here is some nice syntactic sugar. Mmmmm tasty but it comes with a price, speed.Compare that to
as.POSIXct:There’s a lot more Internal coding and less error checking going on with
as.POSIXctSo you have to ask do I want ease and safety or speed and power? Depends on the job.