I have the following, somewhat large dataset:
> dim(dset)
[1] 422105 25
> class(dset)
[1] "data.frame"
>
Without doing anything, the R process seems to take about 1GB of RAM.
I am trying to run the following code:
dset <- ddply(dset, .(tic), transform,
date.min <- min(date),
date.max <- max(date),
daterange <- max(date) - min(date),
.parallel = TRUE)
Running that code, RAM usage skyrockets. It completely saturated 60GB’s of RAM, running on a 32 core machine. What am I doing wrong?
If performance is an issue, it might be a good idea to switch to using
data.tables from the package of the same name. They are fast. You’d do something roughly equivalent something like this: