I’ve a 9 column data.frame (x) and it has millions of rows. I was able to read it into R, successfully do some modifications on it and the code would execute without a problem. However, when I try to write it out to a .csv file using
write.csv(x,file=argv[2],quote=F,row.names=F)
I get an error which says
Error: cannot allocate vector of size 1.2Gb
This makes no sense as the data is already in memory, the computations done, and all I want to do is write it out to disk. Also, while I monitored the memory, the virtual memory size almost doubled for this process during this write phase. Would writing a custom C function to write out this data.frame help? Any suggestions/help/pointers appreciated.
ps: I’m running all this in a 64 bit ubuntu box with about 24G RAM. Overall space may not be an issue. The data size is about 10G
You have to understand that R functions will often copy arguments, if they modify them, as the functional programming paradigm employed by R decrees that functions don’t change the objects passed in as arguments; so R copies them when changes need to be made in the course of executing a function.
If you build R with memory tracing support you can see this copying in action for any operation you are having trouble with. Using the
airqualityexample data set, tracing memory use I seeSo that indicates 6 copies of the data are being made as R prepares it for writing to file.
Clearly that is eating up the 24Gb of RAM you have available; the error says that R needs another 1.2Gb of RAM to complete an operation.
The simplest solution to start with would be to write the file in chunks. Write the first set of lines of data out using
append = FALSE, then useappend = TRUEfor subsequent calls towrite.csv()writing out the remaining chunks. You may need to play around with this to find an chunk size that will not exceed the available memory.