Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8650241
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T13:46:09+00:00 2026-06-12T13:46:09+00:00

I started using data.table package in R to boost performance of my code. I

  • 0

I started using data.table package in R to boost performance of my code. I am using the following code:

sp500 <- read.csv('../rawdata/GMTSP.csv')
days <- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")

# Using data.table to get the things much much faster
sp500 <- data.table(sp500, key="Date")
sp500 <- sp500[,Date:=as.Date(Date, "%m/%d/%Y")]
sp500 <- sp500[,Weekday:=factor(weekdays(sp500[,Date]), levels=days, ordered=T)]
sp500 <- sp500[,Year:=(as.POSIXlt(Date)$year+1900)]
sp500 <- sp500[,Month:=(as.POSIXlt(Date)$mon+1)]

I noticed that the conversion done by as.Date function is very slow, when compared to other functions that create weekdays, etc. Why is that? Is there a better/faster solution, how to convert into date-format? (If you would ask whether I really need the date format, probably yes, because then use ggplot2 to make plots, which work like a charm with this type of data.)

To be more precise

> system.time(sp500 <- sp500[,Date:=as.Date(Date, "%m/%d/%Y")])
   user  system elapsed 
 92.603   0.289  93.014 
> system.time(sp500 <- sp500[,Weekday:=factor(weekdays(sp500[,Date]), levels=days, ordered=T)])
   user  system elapsed 
  1.938   0.062   2.001 
> system.time(sp500 <- sp500[,Year:=(as.POSIXlt(Date)$year+1900)])
   user  system elapsed 
  0.304   0.001   0.305 

On MacAir i5 with slightly less then 3000000 observations.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T13:46:10+00:00Added an answer on June 12, 2026 at 1:46 pm

    I think it’s just that as.Date converts character to Date via POSIXlt, using strptime. And strptime is very slow, I believe.

    To trace it through yourself, type as.Date, then methods(as.Date), then look at the character method.

    > as.Date
    function (x, ...) 
    UseMethod("as.Date")
    <bytecode: 0x2cf4b20>
    <environment: namespace:base>
    
    > methods(as.Date)
    [1] as.Date.character as.Date.date      as.Date.dates     as.Date.default  
    [5] as.Date.factor    as.Date.IDate*    as.Date.numeric   as.Date.POSIXct  
    [9] as.Date.POSIXlt  
       Non-visible functions are asterisked
    
    > as.Date.character
    function (x, format = "", ...) 
    {
        charToDate <- function(x) {
            xx <- x[1L]
            if (is.na(xx)) {
                j <- 1L
                while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
                if (is.na(xx)) 
                    f <- "%Y-%m-%d"
            }
            if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d", 
                tz = "GMT")) || !is.na(strptime(xx, f <- "%Y/%m/%d", 
                tz = "GMT"))) 
                return(strptime(x, f))
            stop("character string is not in a standard unambiguous format")
        }
        res <- if (missing(format)) 
            charToDate(x)
        else strptime(x, format, tz = "GMT")       ####  slow part, I think  ####
        as.Date(res)
    }
    <bytecode: 0x2cf6da0>
    <environment: namespace:base>
    > 
    

    Why is as.POSIXlt(Date)$year+1900 relatively fast? Again, trace it through :

    > as.POSIXct
    function (x, tz = "", ...) 
    UseMethod("as.POSIXct")
    <bytecode: 0x2936de8>
    <environment: namespace:base>
    
    > methods(as.POSIXct)
    [1] as.POSIXct.date    as.POSIXct.Date    as.POSIXct.dates   as.POSIXct.default
    [5] as.POSIXct.IDate*  as.POSIXct.ITime*  as.POSIXct.numeric as.POSIXct.POSIXlt
       Non-visible functions are asterisked
    
    > as.POSIXlt.Date
    function (x, ...) 
    {
        y <- .Internal(Date2POSIXlt(x))
        names(y$year) <- names(x)
        y
    }
    <bytecode: 0x395e328>
    <environment: namespace:base>
    > 
    

    Intrigued, let’s dig into Date2POSIXlt. For this bit we need to grep main/src to know which .c file to look at.

    ~/R/Rtrunk/src/main$ grep Date2POSIXlt *
    names.c:{"Date2POSIXlt",do_D2POSIXlt,   0,  11, 1,  {PP_FUNCALL, PREC_FN,   0}},
    $
    

    Now we know we need to look for D2POSIXlt :

    ~/R/Rtrunk/src/main$ grep D2POSIXlt *
    datetime.c:SEXP attribute_hidden do_D2POSIXlt(SEXP call, SEXP op, SEXP args, SEXP env)
    names.c:{"Date2POSIXlt",do_D2POSIXlt,   0,  11, 1,  {PP_FUNCALL, PREC_FN,   0}},
    $
    

    Oh, we could have guessed datetime.c. Anyway, so looking at latest live copy :

    datetime.c

    Search in there for D2POSIXlt and you’ll see how simple it is to go from Date (numeric) to POSIXlt. You’ll also see how POSIXlt is one real vector (8 bytes) plus seven integer vectors (4 bytes each). That’s 40 bytes, per date!

    So the crux of the issue (I think) is why strptime is so slow, and maybe that can be improved in R. Or just avoid POSIXlt, either directly or indirectly.


    Here’s a reproducible example using the number of items stated in question (3,000,000) :

    > Range = seq(as.Date("2000-01-01"),as.Date("2012-01-01"),by="days")
    > Date = format(sample(Range,3000000,replace=TRUE),"%m/%d/%Y")
    > system.time(as.Date(Date, "%m/%d/%Y"))
       user  system elapsed 
     21.681   0.060  21.760 
    > system.time(strptime(Date, "%m/%d/%Y"))
       user  system elapsed 
     29.594   8.633  38.270 
    > system.time(strptime(Date, "%m/%d/%Y", tz="GMT"))
       user  system elapsed 
     19.785   0.000  19.802 
    

    Passing tz appears to speed up strptime, which as.Date.character does. So maybe it depends on your locale. But strptime appears to be the culprit, not data.table. Perhaps rerun this example and see if it takes 90 seconds for you on your machine?

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I've started using the AspProviders code to store my session data in my table
I have started using data.table. Indeed it is very fast and quite nice syntax.
I started with a CSV file, which I read into a CSV::Table, 104 columns,
I started using Core Data for iPhone development. I started out by creating a
I have just started using the Data Access Application Block from microsoft. There are
I have started using WSO2 Stratos live and started using WSO2 data services server.
I recently started using jQuery/AJAX for submitting data to my PHP API. Currently what
I recently started using Riak with PHP. How exactly do I store JSON data?!
Following this tutorial (http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application), I learned how to save data and do concurrency checks
I've just started using SQLite and I want to write all my application data

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.