In certain criteria we want the mapper do all the work and output to

Question

0

Asked: June 2, 20262026-06-02T19:37:17+00:00 2026-06-02T19:37:17+00:00

In certain criteria we want the mapper do all the work and output to

0

In certain criteria we want the mapper do all the work and output to HDFS, we don’t want the data transmitted to reducer(will use extra bandwidth, please correct me if there is case its wrong).

a pseudo code would be:

def mapper(k,v_list):
  for v in v_list:
    if criteria:
      write to HDFS
    else:
      emit

I found it hard because the only thing we can play with is OutputCollector.
One thing I think of is to exend OutputCollector, override OutputCollector.collect and do the stuff.
Is there any better ways?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T19:37:19+00:00

You can just set the number of reduce tasks to 0 by using JobConf.setNumReduceTasks(0). This will make the results of the mapper go straight into HDFS.

From the Map-Reduce manual: http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

Reducer NONE
It is legal to set the number of reduce-tasks to zero if no reduction is desired.

In this case the outputs of the map-tasks go directly to the FileSystem, 
into the output path set by setOutputPath(Path). The framework does not sort 
the map-outputs before writing them out to the FileSystem.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In certain criteria we want the mapper do all the work and output to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply