I’m actually a bit confused about how hdfs map-reduce actually work in fully distributed mode.
Suppose I am running a word count program. I am only giving the path of ‘hdfs-site’ & ‘core-site’.
Then how things are actually being carried out?
Whether this program is distributed on each node or what ?
Yes, your program is distributed. But it would be wrong to say, that its distributed to every node. It’s more, that hadoop checks for the data you are working with, splits this data into smaller parts (under some constraints from the configuration) and then moves your code to the nodes in the hdfs where these parts are (i assume, that you have a datanode and a tasktracker running on the nodes). First the map part is exeuted on these nodes, which produces some data. This data is stored on the nodes and during the mapping finishes the second part of your job starts on the nodes, the reduce-phase.
The reducers are started on some nodes (again, you configure how many of them) and fetch the data from the mappers, aggregate them and send the output to the hdfs.