I am new to MapReduce and would like your opinion on the best MapReduce

Question

0

Editorial Team

Asked: June 13, 20262026-06-13T12:55:06+00:00 2026-06-13T12:55:06+00:00

I am new to MapReduce and would like your opinion on the best MapReduce

0

I am new to MapReduce and would like your opinion on the best MapReduce approach for the following task.

I have a single large document in the format

1 2 3
2
2 3 4 5

Each line has a list of numbers. I want to list each possible (pair) combination of numbers in any line. And I want the number of lines containing each given pair.

The result will be like

element1 element2 occurrences
1        1        1
1        2        1
1        3        1
2        2        3
2        3        2
3        3        2
3        4        1
3        5        1

There are about 2M lines in the documents, and about 1.5M different numbers. And there will be about 2.5G different pairs of numbers to be counted.

The stright forward pseudo code is like:
Invoke map for each line in the document

Map(int lineId, list<int> elements)
{
  for each pair of integers in elements
    emit(pair, 1)
}

Reduce((int, int) pair, list<int> counts)
{
  return sum(counts)
}

But this approach will require 2M mappers and 2.5G reducers. Is this a plausible way to go?
Planning on trying Hadoop on Azure.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T12:55:07+00:00

But this approach will require 2M mappers and 2.5G reducers. Is this a plausible way to go? Planning on trying Hadoop on Azure.

This assumption is not correct.

The number of mappers for the FileInputFormat is equal to the number of Input Splits. An Input Split can map to a block in HDFS, which is defaulted to 64MB. So, if the input file is 1024 MB, then 16 map tasks will be launched.

The number of reducers is configurable using the mapred.reduce.tasks parameter which is defaulted to 1. Also, note that a combiner can be used to make the job complete faster.

Would suggest to go through the Hadoop – The Definitive Guide for better understanding of MapReduce and Hadoop.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am new to MapReduce and would like your opinion on the best MapReduce

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply