Maybe this has been asked before (where I didn’t find it)…
I have a java.util.Set of aprox. 50000 Strings. I would like to generate some sort of hash to check if it has been changed (comparing hashes of two versions of the Set)?
If the Set changes, the hash has to be different.
How can that be achieved? Thanks!
EDIT:
Sorry for that misleading wording. I don’t want to check if “it” has been changed (the same instance). Instead I want to check if two database queries, which are generating two – maybe identical – instances of a Set of Strings are equal.
Based on this statement:
If the Set changes, the hash has to be differentIt really can’t be achieved, unless you have more constraints. In general, a hash is a value in some fixed space. For example, your hash may be a 32 bit integer, so there are 2^32 possible hash values. In general, b bits gets you 2^b possible hash values. In order to achieve what you want, you have to make sure that every possible set (i.e. – the set of all sets!) is less than or equal to 2^b. But my guess is that you can have arbitrary strings so this isn’t possible. And even if it was possible, you’d have to come up with a way to map onto the hash space, which can be challenging.
However, with a good hash function, it’s not very likely that changing the set will end up producing the same hash value. So you can use the hash to determine inequality, but if the hash is the same, you still need to check for equality. (This is the same idea behind a hash set or a hash map, where elements map to buckets based on a hashcode, but you have to check for equality).
Similar to what Paul mentioned but different: you can instead make a set implementation that has version numbers and ensure that you always generate a new version number when the set is mutated. Then you can compare the version number? I’m not sure if you care about immutable sets or whether the mutable set changes back to a version you have seen (i.e. – if it should always get the same version).
Hope this helps.