what is the difference between calling a mapreduce job from main() and from ToolRunner.run()? When we say that the main class say, MapReduce extends Configured implements Tool , what are the additional privileges we get which we do not have if we were to just make a simple run of the job from the main method? Thanks.
what is the difference between calling a mapreduce job from main() and from ToolRunner.run()
Share
There’s no extra privileges, but your command line options get run via the GenericOptionsParser, which will allow you extract certain configuration properties and configure a Configuration object from it:
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/util/GenericOptionsParser.html
Basically rather that parsing some options yourself (using the index of the argument in the list), you can explicitly configure Configuration properties from the command line:
Becomes much more condensed with ToolRunner:
One final word of warning though: when using the Configuration method getConf(), create your Job object first, then pull its Configuration out – the Job constructor makes a copy of the Configruation object passed in, so if you makes changes to the reference passed in, you job will not see those changes: