I am totally confused with hadoop API. (guess its changing all the time)
If i am not wrong, JobConf was deprecated and we were supposed to use Job and Configuration classes instead to run a map reduce job from java. it seems though that in recently released hadoop 1.0.0 JobConf is not longer deprecated!
So i am using Job and configuration classes to run a map reduce job. Now, i need to put reducers output files in a folder structure based on certain values that are part of my map output. I went through several articles and found that one can achieve that with a OutputFormat Class but we have this class in two packages:
org.apache.hadoop.mapred and
org.apache.hadoop.mapreduce
In our job object we can set a output format class as:
job.setOutputFormatClass(SomeOutputFormat.class);
Now if SomeOutputFormat extends say org.apache.hadoop.mapreduce.lib.output.FileOutputFormat , we get one method named getRecordWriter(); this does not help in any way to override the output path.
There is another way by using jobConf but that again does not seem to work in terms of setting mappers, reducers, partitions, sorting and grouping classes.
Is there something very obvious that i am missing? I want to write my reduce output file inside a folder which is based on a value. for exmaple, SomeOutputPrefix/Value1/Value2/realReduceFileName
Thanks!
I think you need to implement
So your SomeOutputWriter will return
new SomeRecordWriter("SomeOutputPrefix")in itsgetRecordWriter()method, andSomeRecordWriterwill write different values to different folders.