I have a Sequential file which has the key-value pair of type “org.apache.hadoop.typedbytes.TypedBytesWritable” , I have to provide this file as the input to the Hadoop job and have to process it in map only. I mean i dont have to do anything which will need reduce.
1) How will i specify the FileInputFormat as SequentialFile ?
2) What will be the signature of map function.
3) How will i get output from map instead of Reduce?
Set the SequenceFileAsBinaryInputFormat as the input format. Here is the code for the SequenceFileAsBinaryInputFormat class.
Here is the code
The map would be invoked with a BytesWritable as key and value types.
Set the
mapred.reduce.tasksproperty to 0. The output of the map will be the final output of the job.Also, take a look at the SequenceFileAsTextInputFormat. The map would be invoked with Text as key and value types.