I have a multi-stage/job mapreduce program. My first input has to be TextInputFormat and last output has to be TextOutputFormat. What I would like to achieve is to transform within first Job the format from Text to SequenceFile. As such:
TextInputFormat
Job1.execute()
SequenceFileOutputFormat
SequenceFileInputFormat
Job2.execute()
SequenceFileOutputFormat
...
SequenceFileInputFormat
JobLast.execute()
TextOutputFormat
In all the example I have found this is achieved by creating an additional Jobs which simply writes the input as a SequenceFile and another one reads the SequenceFile and stores it in a different format. Can this be done without the use of additional Jobs? Can I do something like that:
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(SequenceFileOutputFormat.class);
While the job is actually performing its computation. How I can achieve this without creating two additional Jobs (write and read).
Problem solved, my mistake in a code, sorry about that.
You can certainly store the output in any form you want. You don’t really need a separate job for that.
SequenceFileOutputFormatcan store any type of key values, so simply statingconf.setOutputFormat(SequenceFileOutputFormat.class);should do the trick. Have you tried it? Didn’t it work? But make sure your input Key and value class for your next map job is compatible with the output key/value classes you used for the last reducer.