I have a little problem where need to do a hash of a number of about 10 digits into a number of 6 digits. The hash needs to be deterministic.
It’s more important that the hash is not resource intensive.
For example, say that I have some number, x, like 123456789
I want to write an hash function that gives me a number, y, back like 987654.
I’d then like to have a function that takes the x and y as parameters, re-applies the hash on x, and checks that the result is y.
It should be difficult to compute possible input values given the hash.
My first idea of multiplying pairs of digits led to a lot of duplicate hashed values.
I have the feeling that this sort of problem has some kind of elegant solution, but I just can’t think of it myself.
Can anyone help me out here? Thanks in advance 🙂
What you want to do is to try to distribute the hash values as evenly as possible over the range. Some of the built in hashing methods are fairly good at this, so you could perhaps try something like getting the hash code of the string representation, and simply throw away half of the bits:
However, it also depends on what you are going to use the hash code for. The built in hash codes are not intended to be stored permanently. The algorithms for calculating the hash codes can change with any new version of the framework, so if you store the hash codes in the database they may become useless in the future. In that case you would instead have to create the hashing algorithm yourself from scratch, or use some hashing algorithm that was designed for permanent storage.
One simple algorithm that is used for hash codes for some values in the framework is to use exclusive or to make all bits in the value matter when the hash code is smaller than the data:
or the more efficient but less obvious way to do the same:
This of course has no obfuscating properties for small values, so you might want to throw in some “random” bits to make the hash code significantly different from the value: