When the JobTracker assigns a map task to a TaskTracker, does it need to talk to NameNode? Or it can get the information from the InputSplit itself?
When I look at the code, I see that the InputSplits are packed with BlockLocations. Would the JobTracker go with this information or does it need to work with Namenode?
When the JobTracker assigns a map task to a TaskTracker, does it need to
Share
The client calculates the split informations and writes them with the split info to HDFS.
You can have a look into Hadoop 1.x
JobSplit.SplitMetaInfothere is the serialization implemented that also serializes the locations.The jobtracker just picks up these serializations and schedules them, where the locations are just a hint for faster execution if slots are available.