Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3215654
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T15:12:10+00:00 2026-05-17T15:12:10+00:00

Is there a distance calculation implementation using hadoop map/reduce. I am trying to calculate

  • 0

Is there a distance calculation implementation using hadoop map/reduce. I am trying to calculate a distance between a given set of points.

Looking for any resources.

Edit

This is a very intelligent solution. I have tried some how like the first algorithm, and I get almost what I was looking for. I am not concerned about optimizing the program at the moment, but my problem was the dist(X,Y) function was not working. When I got all the points on the reducer, I was unable to go through all the points on an Iterator and calculate the distance. Someone on stackoverflow.com told me that the Iterator on hadoop is different than the normal JAVA Iterator, i am not sure about that. But if i can find a simple way to go through the Iterator on my dist() function, i can use your second algorithm to optimize.

//This is your code and I am refering to that code too, just to make my point clear.
map(x,y) {
  for i in 1:N #number of points
    emit(i, (x,y)) //i did exactly like this

    reduce (i, X)
    p1 = X[i]
    for j in i:N
      // here is my problem, I can't get the values from the Iterator.
      emit(dist(X[i], X[j])) 
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T15:12:11+00:00Added an answer on May 17, 2026 at 3:12 pm

    you need to do a self join on that data set. In hive that would look like, more or less

    select dist(P1.x,P1.y,P2.x, P2.y) from points P1 join points P2 on (True) where P1.x < P2.x or (P1.x = P2.x and P1.y < P2.y) 
    

    The function dist would need to be implemented using other hive functions or written in Java and added as a UDF. Also I am not sure about the True constant but you can write 0=0 to the same effect. The where clause is to avoid computing the same distances twice or 0 distances. The question is: would hive optimize this the way you can do programming carefully in hadoop? I am not sure. This is a sketch in hadoop

    map(x,y) {
      for i in 1:N #number of points
         emit(i, (x,y))
    
    reduce (i, X)
      p1 = X[i]
      for j in i:N
         emit(dist(X[i], X[j]))
    

    For this to work you need X to get to the reducer sorted in some order, for instance by x and then by y using secondary sort keys (that do not affect the grouping). This way every reducer gets a copy of all the points and works on a column of the distance matrix you are trying to generate. The memory requirements are minimal. You could trade some communication for memory by re-organizing the computation so that every reducer computes a square submatrix of the final matrix, knowing only two subsets of the points and calculating the distances among all of them. To achieve this, you need to make explicit the order of your points, say you are storing i, x, y

    map(i,x,y) {
      for j in 1:N/k #k is size of submatrix
         emit((i/k, j), ("row", (x,y)))
         emit((j, i/k), ("col", (x,y)))
    
    reduce ((a,b), Z)
      split Z in rows X and cols Y
      for x in X
         for y in Y
         emit(dist(x,y))
    

    In this case you can see that the map phase emits only 2*N*N/k points, whereas the previous algorithm emitted N^2. Here we have (N/k)^2 reducers vs N for the other one. Each reducer has to hold k values in memory (using the secondary key technique to have all the rows get to the reducer before all the columns), vs only 2 before. So you see there are tradeoffs and for the second algorithm you can use the parameter k for perf tuning.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Overview I'm looking to analyse the difference between two characters as part of a
Short question: Given a point P and a line segment L, how do I
I have just been developing a postcode distance calculator for my Dads company, where
I'm trying to figure out how to select data from a MySQL table based
I'm trying to convert spherical coordinates (namely latitude and longitude from a GPS device)
I need to use (not implement) an array based version of Dijkstras algo .The
I'm playing around with calculating Levenshtein distances in Haskell, and am a little frustrated
I've been working on this problem for awhile now, and haven't been able to
I am currently dealing with several thousand boxes that i'd like to project onto
I am writing a site that basically looks for places within a 25 mile

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.