I just ran Elastic Map reduce sample application: “Apache Log Processing”
Default:
When I ran with default configuration (2 Small sized Core instances) – it took 19 minutes
Scale Out:
Then I ran it with configuration: 8 small sized core instances – it took 18 minutes
Scale Up:
Then I ran it with configuration: 2 large sized core instances – it took 14 minutes.
What do think about performance of scale up vs scale out when we have bigger data-sets?
Thanks.
I would say it depends. I’ve usually found the raw processing speed to be much better using m1.large and m1.xlarge instances. Other than that, as you’ve noticed, the same job will probably the same amortized or normalized instance hours to complete.
For your jobs, you might want to experiment with a smaller sample data set at first and see how much time that takes, and then estimate how much time it would take for the full job using large data sets to complete. I’ve found that to be the best way to estimate the time for job completion.