This is more of a beginner’s question. Say I have the following code: library(multicore)

Question

0

Asked: May 26, 20262026-05-26T19:16:31+00:00 2026-05-26T19:16:31+00:00

This is more of a beginner’s question. Say I have the following code: library(multicore)

0

This is more of a beginner’s question. Say I have the following code:

library("multicore")
library("iterators")
library("foreach")
library("doMC")

registerDoMC(16)

foreach(i in 1:M) %dopar% {
   ##do stuff
}

This code then will run on 16 cores, if they are available. Now if I understand correctly, using Amazon EC2, on one instance, I get depending on the instance only few cores. So if I want to run simulations on 16 cores, I need to use several instances, which means as I far as I understand launching new R processes. But then I need to write additional code outside of R to gather the results.

So my question is, is there an R package, which lets to launch EC2 instances from within R, automagicaly distributes the load between these instances, and gathers the results in the initial R launched?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T19:16:31+00:00

To be precise, the maximum instance type on EC2 is currently 8 cores, so anyone, even users of R, would need multiple instances in order to have run concurrently on more than 8 cores.

If you want to use more instances, then you have two options for deploying R: “regular” R invocations or MapReduce invocations. In the former case, you will have to set up code to launch instances, distribute tasks (e.g. the independent iterations in foreach), return results, etc. This is doable, but you’re not likely to enjoy it. In this case, you can use something like rmr or RHipe to manage a MapReduce grid, or you can use snow and many other HPC tools to create a simple grid. Use of snow may make it easier to keep your code intact, but you will have to learn how to tie this stuff together.

In the latter case, you can build upon infrastructure that Amazon has provided, such as Elastic MapReduce (EMR) and packages that make that simpler, such as JD’s segue. I’d recommend segue as a good starting point, as others have done, as it has a gentler learning curve. The developer is also on SO, so you can easily ~~embarrass~~ query him when it breaks.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This is more of a beginner’s question. Say I have the following code: library(multicore)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply