Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8326185
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T00:41:12+00:00 2026-06-09T00:41:12+00:00

Does anyone know how to implement the Natural-Join operation between two datasets in Hadoop?

  • 0

Does anyone know how to implement the Natural-Join operation between two datasets in Hadoop?

More specifically, here’s what I exactly need to do:

I am having two sets of data:

  1. point information which is stored as (tile_number, point_id:point_info) , this is a 1:n key-value pairs. This means for every tile_number, there might be several point_id:point_info

  2. Line information which is stored as (tile_number, line_id:line_info) , this is again a 1:m key-value pairs and for every tile_number, there might be more than one line_id:line_info

As you can see the tile_numbers are the same between the two datasets. now what I really need is to join these two datasets based on each tile_number. In other words for every tile_number, we have n point_id:point_info and m line_id:line_info. What I want to do is to join all pairs of point_id:point_info with all pairs of line_id:line_info for every tile_number


In order to clarify, here’s an example:

For point pairs:

(tile0, point0)
(tile0, point1)
(tile1, point1)
(tile1, point2)

for line pairs:

(tile0, line0)
(tile0, line1)
(tile1, line2)
(tile1, line3)

what I want is as following:

for tile 0:

 (tile0, point0:line0)
 (tile0, point0:line1)
 (tile0, point1:line0)
 (tile0, point1:line1)

for tile 1:

 (tile1, point1:line2)
 (tile1, point1:line3)
 (tile1, point2:line2)
 (tile1, point2:line3)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T00:41:13+00:00Added an answer on June 9, 2026 at 12:41 am

    Use a mapper that outputs titles as keys and points/lines as values. You have to differentiate between the point output values and line output values. For instance you can use a special character (even though a binary approach would be much better).

    So the map output will be something like:

     tile0, _point0
     tile1, _point0
     tile2, _point1 
     ...
     tileX, *lineL
     tileY, *lineK
     ...
    

    Then, at the reducer, your input will have this structure:

     tileX, [*lineK, ... , _pointP, ...., *lineM, ..., _pointR]
    

    and you will have to take the values separate the points and the lines, do a cross product and output each pair of the cross-product , like this:

    tileX (lineK, pointP)
    tileX (lineK, pointR)
    ...
    

    If you can already easily differentiate between the point values and the line values (depending on your application specifications) you don’t need the special characters (*,_)

    Regarding the cross-product which you have to do in the reducer:
    You first iterate through the entire values List, separate them into 2 list:

     List<String> points;
     List<String> lines;
    

    Then do the cross-product using 2 nested for loops.
    Then iterate through the resulting list and for each element output:

    tile(current key), element_of_the_resulting_cross_product_list
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Does anyone know how to implement an action badge, like the ones seen below
Does anyone know how to implement 3d stacked groups of images like that in
Does anyone know if it is possible to implement playback of an audio file
Does anyone know is there a way to implement Windows Live ID authentication into
Does anyone know how I can implement a single Touch Event. A simple, one
Does anyone know of a way to implement a drag and drop feature with
does anyone know of a native way to implement multiple element select paired with
Does anyone know how to implement the typing indicator (John is typing) for an
I know this sounds silly, but does anyone know how to implement CSS fallback
Does anyone know what the best way to implement a list/grid togglable view in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.