I currently have a MapReduce job that uses MultipleOutputs to send data to several

Question

0

Asked: May 22, 20262026-05-22T02:39:17+00:00 2026-05-22T02:39:17+00:00

I currently have a MapReduce job that uses MultipleOutputs to send data to several

0

I currently have a MapReduce job that uses MultipleOutputs to send data to several HDFS locations. After that completes, I am using HBase client calls (outside of MR) to add some of the same elements to a few HBase tables. It would be nice to add the HBase outputs as just additional MultipleOutputs, using TableOutputFormat. In that way, I would distribute my HBase processing.

Problem is, I cannot get this to work. Has anyone ever used TableOutputFormat in MultipleOutputs…? With multiple HBase outputs?

basically, I am setting up my collectors, like this….

Outputcollector<ImmutableBytesWritable, Writable> hbaseCollector1 = multipleOutputs.getCollector("hbase1", reporter); 
Outputcollector<ImmutableBytesWritable, Writable> hbaseCollector2 = multipleOutputs.getCollector("hbase2", reporter); 
Put put = new Put(mykey.getBytes());
put.add("family".getBytes(), "column".getBytes(), somedata1);
hbaseCollector1.collect(NullWritable.get(), put);

put = new Put(mykey.getBytes());
put.add("family".getBytes(), "column".getBytes(), somedata2);
hbaseCollector2.collect(newImmutableBytesWritable(mykey.getBytes()), put);

This seems to follow the general idea of hbase writing, I think.

Part of the issue, as I type this, might be more in the job definition. Looks like MR (and Hbase) want a global parameter set, like this….

conf.set(TableOutputFormat.OUTPUT_TABLE, "articles");

to provide the table name. Trouble is, I have two tables….

Any ideas?

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T02:39:18+00:00

Editorial Team

2026-05-22T02:39:18+00:00Added an answer on May 22, 2026 at 2:39 am

So, apparently, this not possible with the old mapred packages. There is a new OutputFormat in the mapreduce package set, but I don’t want to convert to that right now. So, I will have to write multiple MR jobs.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I currently have a MapReduce job that uses MultipleOutputs to send data to several

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply