The datanode-namenode communication uses the org.apache.hadoop.ipc package; while the inter-datanode communication is based on simple socket communication.
What is the motivation behind such design?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
There are two different tasks by their requirements so two different implementations can be explained by desire to better suit the requirements.
DataNode -> NameNode communication is more complex then DataNode-DataNode communication and thus justify RPC.
DataNode-DataNode communication is extremely simple in one hand, and require efficient transport of big amount of data. Can be stated that sockets is a most efficient solution for this case.