I am struggling with a very basic issue in hadoop
streaming in the “-file” option.
First I tried the very basic example in streaming:
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper
org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer /bin/wc
-inputformat KeyValueTextInputFormat -input gutenberg/* -output
gutenberg-outputtstchk22
which worked absolutely fine.
Then I copied the IdentityMapper.java source code and compiled it.
Then I placed this class file in the /home/hadoop folder and executed the
following in the terminal.
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -file
~/IdentityMapper.class -mapper IdentityMapper.class \ -reducer /bin/wc
-inputformat KeyValueTextInputFormat -input gutenberg/* -output
gutenberg-outputtstch6
The execution failed with the following error in the stderr file:
java.io.IOException: Cannot run program “IdentityMapper.class”:
java.io.IOException: error=2, No such file or directory
Then again I tried it by copying the IdentityMapper.class file in the
hadoop installation and executed the following:
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -file
IdentityMapper.class -mapper IdentityMapper.class \ -reducer /bin/wc
-inputformat KeyValueTextInputFormat -input gutenberg/* -output
gutenberg-outputtstch5
But unfortunately again I got the same error.
It would be great if you can help me with it as I cannot move any further
without overcoming this.
Thanking you in anticipation.
Why do you want to compile the class? It is already compiled in the hadoop jars. You are just passing the classname (org.apache.hadoop.mapred.lib.IdentityMapper), because Hadoop uses reflection to instantiate a new instance of this mapping class.
You have to make sure that this is lying in the classpath e.g. within a jar you are passing the job.