We are launching hadoop cluster on amazon ec2 and recently we are having network issues like master unable to connect to slave. We thought the reason is due to amazon throttling the network connections over a limit. So, we tried to establish a connection after a random delay from each slave node. But, that didn’t help.
Are there any other suggestions?
Thank you
Bala
Have you tried using the hadoop-ec2 scripts from cloudera? I’ve been using them for setting up occasional hadoop clusters for my thesis research and I’ve found them to work quite well. The setup takes a few minutes but once it’s setup you just do
and it setups all the stuff you need, and usually does a really good job. Occasionally, a node won’t startup or something, but it’s easy enough to terminate the cluster and try again, and it doesn’t cost too much.
You can find the instructions for setting them up here: