Is there anyway by which each reducer process could determine the number of elements

Question

0

Editorial Team

Asked: June 16, 20262026-06-16T09:19:16+00:00 2026-06-16T09:19:16+00:00

Is there anyway by which each reducer process could determine the number of elements

0

Is there anyway by which each reducer process could determine the number of elements or records it has to process ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T09:19:17+00:00

Your reducer class must extend the MapReducer Reduce class:

Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

and then must implement the reduce method using the KEYIN/VALUEIN arguments specified in the extended Reduce class

reduce(KEYIN key, Iterable<VALUEIN> values, org.apache.hadoop.mapreduce.Reducer.Context context)

The values associated with a given key can be counted via

int count = 0;
Iterator<VALUEIN> it = values.iterator();
while(it.hasNext()){
  it.Next();
  count++;
}

Though I’d propose doing this counting along side your other processing as to not make two passes through your value set.

EDIT

Here’s an example vector of vectors that will dynamically grow as you add to it (so you won’t have to statically declare your arrays, and hence don’t need the size of the values set). This will work best for non-regular data (IE the number of columns is not the same for every row in your input csv file), but will have the most overhead.

Vector table = new Vector();

Iterator<Text> it = values.iterator();
while(it.hasNext()){

  Text t = it.Next();
  String[] cols = t.toString().split(",");   

  int i = 0;
  Vector row = new Vector(); //new vector will be our row
  while(StringUtils.isNotEmpty(cols[i])){
    row.addElement(cols[i++]); //here were adding a new column for every value in the csv row
  }

  table.addElement(row);
}

Then you can access the Mth column of the Nth row via

table.get(N).get(M);

Now, if you knew the # of columns would be set, you could modify this to use a Vector of arrays which would probably be a little faster/more space efficient.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Is there anyway by which each reducer process could determine the number of elements

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply