Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7576149
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T16:51:09+00:00 2026-05-30T16:51:09+00:00

Why is Reduce required in MapReduce? If a job such as counting the number

  • 0

Why is Reduce required in MapReduce? If a job such as counting the number of words in a book will result in the same outcome if performed by a single process or MapReduced over a farm of servers, what possibility exists that duplicates will have to be removed? I’m assuming the Reduce step, at least in this example, would simply SUM the outcome from each worker process and deliver the total count of words in the book. I don’t understand where duplicates of anything come into the picture.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T16:51:10+00:00Added an answer on May 30, 2026 at 4:51 pm

    The Reduce step is not meant for removing duplicates (although that is a possible use case in some situations). Reduce is meant for aggregation of outputs from various mappers with the same key.

    For example in the word count example, Node 1 might get 10 instances of a word, say “school” and Node 2 might have 15 instances, Node 3 12 instances. Now how will the sum be calculated? The results 10, 15 and 12 are on different nodes. There is a shuffle phase which brings all these values to one node (which is the reducer allotted to the key “school” by the partitioner). Then the reducer will have all the values for this key, and it can sum them up.

    EDIT: As Tudor mentioned, by aggregation I mean aggregation in the more general sense of “bringing together”

    EDIT2: To clarify RaffiM’s doubt:
    Continuing the above example, let’s say Node 1 had pages 1-10, Node 2 had pages 11-20 and Node 3 got pages 21-30. So, after the mao phase, we know that Pages 1-10 have the word “school” 10 times, pages 11-20 have that word appear 15 times and likewise, 15 times for pages 21-30. Now what we need is the total number of times the word appears in the whole book, so we need to still add these up. We need 10+15+12+the numbers for the other page ranges…

    If you don’t use the combiner, the mapper just sends “1” for each time the word appears. So for pages 1-10, it will send <“school”,1> as the output key-value 10 times. To make it more efficient, we use the combiner which sums it up at the mapper level. So if you use the combiner, it will consume this in Node 1 itself and generate a consolidated output <“school”, 10> for Node 1.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

How can I reduce resources inside my application? I have tried a number of
I am trying to reduce some code here. I will explain how I have
Is there a way to reduce the memory required by the YUI compressor or
Hi javascript developers, Is there a way to reduce the boilerplate required to define
Is it possible to write map/reduce jobs for Amazon Elastic MapReduce ( http://aws.amazon.com/elasticmapreduce/ )
Map Reduce is a pattern that seems to get a lot of traction lately
I'm trying to reduce the form spam on our website. (It's actually pretty recent).
How do YOU reduce compile time, and linking time for VC++ projects (native C++)?
I'm trying to reduce the amount of space between TreeView items to fit more
This should reduce the executable size quite a bit in some of my very

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.