When running Hadoop in EC2, I seem to have two options: A: Manage the

Question

0

Editorial Team

Asked: May 20, 20262026-05-20T01:07:02+00:00 2026-05-20T01:07:02+00:00

When running Hadoop in EC2, I seem to have two options: A: Manage the

0

When running Hadoop in EC2, I seem to have two options:

A: Manage the cluster myself, using the EC2-specific shell scripts that come with Hadoop.
B: Use Elastic MapReduce, and pay a little extra for the convenience.

I’m leaning towards B, but I’d appreciate some advice from people with more experience. Here are my questions:

Are there any tasks that can be done with one of these methods but not the other?
Are there other options besides these two that I’m overlooking?
If I choose B, how easy would it be to go back to A? That is, what’s the danger of vendor lock-in?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T01:07:02+00:00

I have been told by people close to the Amazon Elastic MapReduce (EMR) development team that there are at least two other advantages to using EMR: a) Amazon is actively applying bug fixes and performance enhancements to the Hadoop code base used on EMR, and b) Amazon employs a high performance network between EMR servers and S3 servers that may not be available between EC2 servers and S3 servers.

UPDATE: See @mat’s comments that refute the rumored advantages of using EMR.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When running Hadoop in EC2, I seem to have two options: A: Manage the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply