If I have a 4 node cluster, where 1 machine is the namenode and the remaining 3 machines are datanodes, and if I set the number of reducers to 1, which of the data nodes will run the reducer?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The namenode and datanode are HDFS processes not MapReduce. I assume you have 3 task tracker nodes. One of them will run it. There is no guarantee which one. Hadoop generally moves computation to be near the data that it needs but for reducers they are pulling data from mappers not HDFS. You can say Hadoop will prefer a less loaded node with at least one reduce slot.