I noticed that there are two sets of Hadoop configuration parameters: one with mapred.* and the other with mapreduce.. I am guessing these might be due to old API vs. new API but if I am not mistaken, these seem to coexist in the new API. Am I correct? If so, is there a generalized statement what is used for mapred. and what is for mapreduce.*?
I noticed that there are two sets of Hadoop configuration parameters: one with mapred.*
Share
Examining the source for 0.20.2, there are only a few
mapreduce.*properties, and they revolve around configuring the job input/output format, mapper/combiner/reducer and partitioner classes (they also signal to the job client that the new API is being used by the user – look through the source foro.a.h.mapreduce.Job,setUseNewAPI()method)mapreduce.inputformat.classmapreduce.outputformat.classmapreduce.partitioner.classmapreduce.map.classmapreduce.combine.classmapreduce.reduce.classThere are some more properties but they are secondary configuration
The input and output formats, whether it be new or old API versions, typically use
mapred.*propertiesFor example, the signal your map reduce input paths you use
mapred.input.dir(whether you’re using the new or old API). Same for the output propertymapred.output.dirSo the long and the short of if is, if there isn’t a utility method to configure the property (
FileInputFormat.setInputPaths(Job, String)) then you’ll need to check the source