As I am sitting here waiting for some R scripts to run…I was wondering… is there any way to parallelize rbind in R?
I sitting waiting for this call to complete frequently as I deal with large amounts of data.
do.call("rbind", LIST)
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
I doubt that you can get this to work faster by parallellizing it: apart from the fact that you would probably have to write it yourself (thread one first rbinds item 1 and 2, while thread two rbinds items 3 and 4 etc., and when they’re done, the results are ‘rebound’, something like that – I don’t see a non-C way of improving this), it is going to involve copying large amounts of data between your threads, which is typically the thing that goes slow in the first place.
In C, you can share objects between threads, so then you could have all your threads write in the same memory. I wish you the best of luck with that 🙂
Finally, as an aside: rbinding data.frames is just slow. If you know up front that the structure of all your data.frames is exactly the same, and it doesn’t contain pure character columns, you can probably use the trick from this answer to one of my questions. If your data.frame contains character columns, I suspect that your best off handling these separately (
do.call(c, lapply(LIST, "[[", "myCharColName"))) and then performing the trick with the rest, after which you can reunite them.