I want to write a job in which each mapper checks if a file from hdfs is stored in the node that is being executed.If this doesn’t happen I want to retrieve it from hdfs and store it locally in this node.Is this possible?
EDIT: I am trying to do this (3) Preprocessing for Repartition Join, as described here: link
DistributedCache feature in Hadoop can be used to distribute the side data or auxiliary data required for the completion of the job. Here (1, 2) are some interesting articles for the same.