I have a data.frame and I want to write it out. The dimensions of my data.frame are 256 rows by 65536 columns. What are faster alternatives to write.csv?
I have a data.frame and I want to write it out. The dimensions of
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
If all of your columns are of the same class, convert to a matrix before writing out, provides a nearly 6x speed up. Also, you can look into using
write.matrix()from packageMASS, though it did not prove faster for this example. Maybe I didn’t set something up properly:EDIT
To address the concern raised below that the results above are not fair to data.frame, here are some more results and timing to show that the overall message is still “convert your data object to a matrix if possible. If not possible, deal with it. Alternatively, reconsider why you need to write out a 200MB+ file in CSV format if the timing is of the utmost importance”:
So, nothing really changes. To confirm this is reasonable, consider the relative time costs of
as.data.frame():So, not really a big deal or skewing information as much as the comment below would believe. If you’re still not convinced that using
write.csv()on large data.frames is a bad idea performance wise, consult the manual under theNote:Finally, consider moving to a native RData object if you’re still losing sleep over saving things faster