I’m starting with Hadoop framework, my task is to write map-reduce application for the framework and and submit it. I have to use version 0.22.0 of Hadoop. I’m just learning basic concepts and API. However I find it very hard to learn it and to program some prototypes because both the official documentation and API javadocs are outdated, incomplete, generally chaotic and even non-existing.
Here are just few thinks that I do not understand: The MapReduce tutorial for Hadoop 0.22.0 uses constructor (here, line 101) of class Job that is deprecated. All other constructors are also deprecated. There is no note in the javadocs about what is to be used instead. There are static methods of class Job that return instance of Job but those methods are undocumented and they require instance of poorly documented class Cluster as parameter. So after reading all that mess I still don’t know how to properly get instance of Job. Any help on this is appreciated.
When I tried to find out the answer in tutorial to later versions like 1.0.4 stable I found out that mapreduce tutorial for that version uses all the classes from package org.apache.hadoop.mapred that are deprecated in version 0.22.0. So 0.22.0 is more resent then 1.0.4. Please help me understand this. Or suggest some better resources.
The Javadoc might be a bit confusing, therefore having a look at the source of the Job class will probably help you:
So you can use:
Note that instantiating the Job class in this way will create at the same time a connection to the
job trackeras well.If you want to defer doing so, you have the option to lazily initialize this connection by setting
Clusterto null when creating the Job object. In this case you will let the Job class to make the connection when it’s really needed (see further information here) :