I’m trying to implement an collision attack on hashes (I’m visiting the course ‘cryptography’). Therefore I have two arrays of hashes (= byte-sequences byte[]) and want to find hashes which are present in both arrays. After some research and a lot of thinking I am sure that the best solution on a single-core machine would be a HashSet (add all elements of the first array and check via contains if elements of the second array are already present).
However, I want to implement a concurrent solution, since I have access to a machine with 8 cores and 12 GB RAM. The best solution I can think of is ConcurrentHashSet, which could be created via Collections.newSetFromMap(new ConcurrentHashMap<A,B>()). Using this data structure I could add all elements of the first array in parallel and – after all elements where added – I can concurrently check via contains for identical hashes.
So my question is: Do you know an algorithm designed for this exact problem? If not, do you have experience using such a ConcurrentHashSet concerning problems and effective runtime complexity? Or can you recommend another prebuilt data structure which could help me?
PS: If anyone is interested in the details: I plan to use Skandium to parallelize my program.
I think it would be a complete waste of time to use any form of
HashMap. I am guessing you are calculating multi-byte hashes of various data, these are alreadyhashes, there is no need to perform any more hashing on them.Although you do not state it, I am guessing your hashes are
bytesequences. Clearly either a trie or a dawg would be ideal to store these.I would suggest therefore you implement a
trie/dawgand use it to store all of the hashes in the first array. You could then use all of your computing power in parallel to lookup each element in your second array in thistrie. No locks would be required.Added
Here’s a simple
Dawgimplementation I knocked together. It seems to work.Added
This could be a good start at a concurrent lock-free version. These things are notoriously difficult to test so I cannot guarantee this will work but to my mind it certainly should.