I’m asking this as a general/beginner question about R, not specific to the package I was using.
I have a dataframe with 3 million rows and 15 columns. I don’t consider this a huge dataframe, but maybe I’m wrong.
I was running the following script and it’s been running for 2+ hours – I imagine there must be something I can do to speed this up.
Code:
ddply(orders, .(ClientID), NumOrders=len(OrderID))
This is not an overly intensive script, or again, I don’t think it is.
In a database, you could add an index to a table to increase join speed. Is there a similar action in R I should be doing on import to make functions/packages run faster?
With the suggested
data.tablepackage, the following operation should do the job within a second: