Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7709559
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T00:53:34+00:00 2026-06-01T00:53:34+00:00

I have seen a lot of shared hashmap implementations. Here is the specific scenario

  • 0

I have seen a lot of shared hashmap implementations. Here is the specific scenario that I am trying to tackle.

I am trying to do hierarchical clustering in a multiprocessor system. Lets say I run ‘n’ threads in ‘n’ processors. Let the total number of inputs be K. In the first iteration, we have to find the distances between all the pairs (k^2) and store them in the hash map. To make this multithreaded, I assign each processor (K^2 / n) input pairs to process.

Now the distance results have to be stored in some kind of hash maps for the next iterations. Each processor also outputs the least distance it found. The pair which has the minimum distance among all processors is merged.

In the next iteration, we need to find the distance of this newly merged pair with all the other (k-2) inputs. And compare these new distances with the distances of the all the other pairs which are already stored in the hash table.

Since there are concurrent writes on the hashtables, using a single hashtable with a lock effectively kills the parallelism.

One requirement of the system is that, each thread will NOT get the same pairs it got last time. So it has to read the hashes generated by itself and other threads to find the distances that have already been stored.

So I have come up with the following ideas:

    -Each thread has its own hash table and has access to the hash table of other threads.
    -Iteration -1 : No read is performed this time since the hash tables are empty.  So each thread just writes to its own hash table.
    -Iterations 2 : Each thread is going to generate some new pairs.  But for all the other old pairs it needs to read the hash_maps to find the distance (might be its own hash_map or the hash_map of other threads).
    -Iterations 3 to k-1 : Same as iteration 2.

To improve parallelism from iteration 2 to k-1, I have devised the following idea:

            - store the newly generated values in a new hashmap.
            - for old values keep reading the old hash_maps.  Since concurrent reads can be done, this phase is completely parallel.
            - for each entry in the new hash_map
                      find the which threads's hashmap has this entry.  Replace the old value by the new value.  This step might be effectively sequential because we have to both read and write at the same time.

Is this an efficient idea to implement? If you have any suggestions on how to improve this, please let me know. Especially, for the third step – that is the bottle neck of this whole idea. If there is an efficient implementation that can acheive the maximum amount of parallelism for this step, then it would be great.

I am using the sparse hash library from google as a hash_map.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T00:53:36+00:00Added an answer on June 1, 2026 at 12:53 am

    So one way to do this is to just have a bucketed hash map — have N maps that each store keys with hashes that are 0 mod n, 1 mod n, etc. Then, you only need to lock one-nth of the hash maps at once. Since you expect reads to be much more common than writes you could use shared locks for reads and exclusive locks for writes which will lower your contention even more.

    You could also have a “shuffle step” where, rather than each thread writing the values it computed, each thread was responsible for all the writes to a particular bucket. Threads would first write new values to queues corresponding to the hash-table buckets (which you could do in various contention-minimizing ways) and then each thread would consume a single queue and perform all the writes to its single hash-table in one big go — contention-free.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I've seen a lot of discussion about this subject on here. If i have
I have seen a lot of mobile phone apps that just open a web
[This is .Net 3.5.] I have seen a lot of examples that say, do
I have seen a lot of C# programs that use the [] , for
I have seen a lot of questions here regarding the Facebook Graph API but
I have seen a lot of related questions but none really helped. I'm trying
This is a UI element that I have seen a lot lately (in the
After I have seen a lot of questions here using the DATE_SUB() or DATE_ADD()
I have seen a lot of discussions going on and people asking about DataGrid
I have seen a lot of ob_get_clean() the last while. Typically I have done

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.