Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8750733
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T12:55:06+00:00 2026-06-13T12:55:06+00:00

I am new to MapReduce and would like your opinion on the best MapReduce

  • 0

I am new to MapReduce and would like your opinion on the best MapReduce approach for the following task.

I have a single large document in the format

1 2 3
2
2 3 4 5

Each line has a list of numbers. I want to list each possible (pair) combination of numbers in any line. And I want the number of lines containing each given pair.

The result will be like

element1 element2 occurrences
1        1        1
1        2        1
1        3        1
2        2        3
2        3        2
3        3        2
3        4        1
3        5        1

There are about 2M lines in the documents, and about 1.5M different numbers. And there will be about 2.5G different pairs of numbers to be counted.

The stright forward pseudo code is like:
Invoke map for each line in the document

Map(int lineId, list<int> elements)
{
  for each pair of integers in elements
    emit(pair, 1)
}

Reduce((int, int) pair, list<int> counts)
{
  return sum(counts)
}

But this approach will require 2M mappers and 2.5G reducers. Is this a plausible way to go?
Planning on trying Hadoop on Azure.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T12:55:07+00:00Added an answer on June 13, 2026 at 12:55 pm

    But this approach will require 2M mappers and 2.5G reducers. Is this a plausible way to go? Planning on trying Hadoop on Azure.

    This assumption is not correct.

    The number of mappers for the FileInputFormat is equal to the number of Input Splits. An Input Split can map to a block in HDFS, which is defaulted to 64MB. So, if the input file is 1024 MB, then 16 map tasks will be launched.

    The number of reducers is configurable using the mapred.reduce.tasks parameter which is defaulted to 1. Also, note that a combiner can be used to make the job complete faster.

    Would suggest to go through the Hadoop – The Definitive Guide for better understanding of MapReduce and Hadoop.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to perform a mapreduce job with the following steps: 1) Map
I'm almost completely new to HBase. I would like to take my current site
I am new to Hive and Hadoop. I have implemented a task in hive.
I have a large dataset (about 1.1M documents) that I need to run mapreduce
So i'm new with mongodb and mapreduce in general and came across this "quirk"
I am new to Hive, MapReduce and Hadoop. I am using Putty to connect
I need to invoke a mapreduce job from java application. I use ToolRunner.run(new Validation(),
New to PHP and MySQL, have heard amazing things about this website from Leo
New to Regex. I want to validate to this format: Any character allowed, except
I'm kind of new to MapReduce in Hadoop. I'm trying to process entries from

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.