I have seen a lot of shared hashmap implementations. Here is the specific scenario that I am trying to tackle.
I am trying to do hierarchical clustering in a multiprocessor system. Lets say I run ‘n’ threads in ‘n’ processors. Let the total number of inputs be K. In the first iteration, we have to find the distances between all the pairs (k^2) and store them in the hash map. To make this multithreaded, I assign each processor (K^2 / n) input pairs to process.
Now the distance results have to be stored in some kind of hash maps for the next iterations. Each processor also outputs the least distance it found. The pair which has the minimum distance among all processors is merged.
In the next iteration, we need to find the distance of this newly merged pair with all the other (k-2) inputs. And compare these new distances with the distances of the all the other pairs which are already stored in the hash table.
Since there are concurrent writes on the hashtables, using a single hashtable with a lock effectively kills the parallelism.
One requirement of the system is that, each thread will NOT get the same pairs it got last time. So it has to read the hashes generated by itself and other threads to find the distances that have already been stored.
So I have come up with the following ideas:
-Each thread has its own hash table and has access to the hash table of other threads.
-Iteration -1 : No read is performed this time since the hash tables are empty. So each thread just writes to its own hash table.
-Iterations 2 : Each thread is going to generate some new pairs. But for all the other old pairs it needs to read the hash_maps to find the distance (might be its own hash_map or the hash_map of other threads).
-Iterations 3 to k-1 : Same as iteration 2.
To improve parallelism from iteration 2 to k-1, I have devised the following idea:
- store the newly generated values in a new hashmap.
- for old values keep reading the old hash_maps. Since concurrent reads can be done, this phase is completely parallel.
- for each entry in the new hash_map
find the which threads's hashmap has this entry. Replace the old value by the new value. This step might be effectively sequential because we have to both read and write at the same time.
Is this an efficient idea to implement? If you have any suggestions on how to improve this, please let me know. Especially, for the third step – that is the bottle neck of this whole idea. If there is an efficient implementation that can acheive the maximum amount of parallelism for this step, then it would be great.
I am using the sparse hash library from google as a hash_map.
So one way to do this is to just have a bucketed hash map — have N maps that each store keys with hashes that are 0 mod n, 1 mod n, etc. Then, you only need to lock one-nth of the hash maps at once. Since you expect reads to be much more common than writes you could use shared locks for reads and exclusive locks for writes which will lower your contention even more.
You could also have a “shuffle step” where, rather than each thread writing the values it computed, each thread was responsible for all the writes to a particular bucket. Threads would first write new values to queues corresponding to the hash-table buckets (which you could do in various contention-minimizing ways) and then each thread would consume a single queue and perform all the writes to its single hash-table in one big go — contention-free.