Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 842741
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T06:00:54+00:00 2026-05-15T06:00:54+00:00

I try to implement Hash join in Hadoop. However, Hadoop seems to have already

  • 0

I try to implement Hash join in Hadoop.

However, Hadoop seems to have already a map-side join and a reduce – side join already implemented.

What is the difference between these techniques and hash join?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T06:00:54+00:00Added an answer on May 15, 2026 at 6:00 am

    Map-side Join

    In a map-side (fragment-replicate) join, you hold one dataset in memory (in say a hash table) and join on the other dataset, record-by-record. In Pig, you’d write

    edges_from_list = JOIN a_follows_b BY user_a_id, some_list BY user_id using 'replicated';
    

    taking care that the smaller dataset is on the right. This is extremely efficient, as there is no network overhead and minimal CPU demand.

    Reduce Join

    In a reduce-side join, you group on the join key using hadoop’s standard merge sort.

    <user_id   {A, B, F, ..., Z},  { A, C, G, ..., Q} >
    

    and emit a record for every pair of an element from the first set with an element from the second set:

    [A   user_id    A]
    [A   user_id    C]
    ...
    [A   user_id    Q]
    ...
    [Z   user_id    Q]
    

    You should design your keys so that the dataset with the fewest records per key comes first — you need to hold the first group in memory and stream the second one past it. In Pig, for a standard join you accomplish this by putting the largest dataset last. (As opposed to the fragment-replicate join, where the in-memory dataset is given last).

    Note that for a map-side join the entirety of the smaller dataset must fit in memory. In a standard reduce-side join, only each key’s groups must fit in memory (actually each key’s group except the last one). It’s possible to avoid even this restriction, but it requires care; look for example at the skewed join in Pig.

    Merge Join

    Finally, if both datasets are stored in total-sorted order on the join key, you can do a merge join on the map side. Same as the reduce-side join, you do a merge sort to cogroup on the join key, and then project (flatten) back out on the pairs.

    Because of this, when generating a frequently-read dataset it’s often a good idea to do a total sort in the last pass. Zebra and other databases may also give you total-sorted input for (almost) free.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'll try to be as clear as I can. I have already implemented my
I try to implement an initialization method for my own type. However after calling
I have implemented a web service(.asmx) using .NET framework that returns me a hash
I try to implement compile-time algorithm selection using template specialization. I hash the following
I try to implement a query by using Between Clause,but there is some problem
I try to implement the animation: when you enter iPhone Gallery, press the image,
I try to implement footer datagrid with allowMultipleSelection properties. But in my case, it
I try to implement a browser-like app. I want to let it can open
I try to implement HierarchicalDataTemplate for the self referencing table in Silverlight 4. It
I try to implement a i2c slave receiver interrupt service routine on a stm32f4.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.