I’ve got access to a big, powerful cluster. I’m a halfway decent R programmer, but totally new to shell commands (and terminal commands in general besides basic things that one needs to do to use ubuntu).
I want to use this cluster to run a bunch of parallel processes in R, and then I want to combine them. Specifically, I have a problem analogous to:
my.function <-function(data,otherdata,N){
mod = lm(y~x, data=data)
a = predict(mod,newdata = otherdata,se.fit=TRUE)
b = rnorm(N,a$fit,a$se.fit)
b
}
r1 = my.function
r2 = my.function
r3 = my.function
r4 = my.function
...
r1000 = my.function
results = list(r1,r2,r3,r4, ... r1000)
The above is just a dumb example, but basically I want to do something 1000 times in parallel, and then do something with all of the results from the 1000 processes.
How do I submit 1000 jobs simultaneously to the cluster, and then combine all the results, like in the last line of the code?
Any recommendations for well-written manuals/references for me to go RTFM with would be welcome as well. Unfortunately, the documents that I’ve found aren’t particularly intelligible.
Thanks in advance!
You can combine
plyrwithdoMCpackage (that is a parallel backend to theforeachpackage) as follows:Edit: If you’re talking about submitting simultaneous jobs, then don’t you have a LSF license? You can then use
bsubto submit as many jobs as you need and it also takes care of load-balancing and what not…!Edit 2: A small note on load-balancing (example using LSF’s
bsub):What you mention is something similar to what I wrote here =>
LSF. You can submitjobsin batches. For ex: using inLSFyou can usebsubto submit a job to the cluster like so:and this will place you on the queue and allocate for you the number of processors (if and when available) your job will start running (depending on resources). You can
pause,restart,suspendyour jobs.. and much much more..qsubis something similar to this concept. The learning curve maybe a bit steep, but it is worth it.