I’ve hit a wall trying to merge a large file and a smaller one. I have read many other posts about memory management in R, and haven’t been able to find a non-extreme (go 64bit, upload to a cluster, etc) method of resolving it. I’ve tried a bit with the bigmemory package, but not been able to find a solution. I thought I’d try here before I throw my hands up in frustration.
The code I’m running is like the below:
#rm(list=ls())
localtempdir<- "F:/Temp/"
memory.limit(size=4095)
[1] 4095
memory.size(max=TRUE)
[1] 487.56
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 170485 4.6 350000 9.4 350000 9.4
Vcells 102975 0.8 52633376 401.6 62529185 477.1
client_daily<-read.csv(paste(localtempdir,"client_daily.csv",sep=""),header=TRUE)
object.size(client_daily)
>130MB
sbp_demos<-read.csv(paste(localtempdir,"sbp_demos",sep=""))
object.size(demos)
>0.16MB
client_daily<-merge(client_daily,sbp_demos,by.x="OBID",by.y="OBID",all.x=TRUE)
Error: cannot allocate vector of size 5.0 MB
I guess I’m asking are there any clever ways around this which don’t involve buying new hardware?
- I need to be able to
mergeto create a bigger object. - I’ll then need to be doing regressions etc with that bigger object.
Should I give up? Should bigmemory be able to help solve this?
Any guidance greatly appreciated.
Details: R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386
(32-bit) Intel 2 Duo Core @2.33GHz, 3.48GB RAM
As Chase already mentioned, you can try data.table or perhaps sqldf.
For either one, you will likely get more juice out of it if you set the indexes appropriately.
With data.table you would:
If you go the
sqldfroute, look at example 4i on its website … again, make sure you use indexes correctly.