I want to hash a char array in to an int or a long. The resulting value has to adhere to a given precision value.
The function I’ve been using is given below:
int GetHash(const char* zKey, int iPrecision /*= 6*/)
{
/////FROM : http://courses.cs.vt.edu/~cs2604/spring02/Projects/4/elfhash.cpp
unsigned long h = 0;
long M = pow(10, iPrecision);
while(*zKey)
{
h = (h << 4) + *zKey++;
unsigned long g = h & 0xF0000000L;
if (g) h ^= g >> 24;
h &= ~g;
}
return (int) (h % M);
}
The string to be hashed is similar to “SAEUI1210.00000010_1”.
However, this produces duplicate values in some cases.
Are there any good alternatives which wouldn’t duplicate the same hash for different string values.
The very definition of a hash is that it produces duplicate values for some values, due to hash value range being smaller than the space of the hashed data.
In theory, a 32-bit hash has enough range to hash all ~6 character strings (A-Z,a-z,0-9 only), without causing a collision. In practice, hashes are not a perfect permutation of the input. Given a 32-bit hash, you can expect to get hash collisions after hashing ~16 bit of random inputs, due to the birthday paradox.
Given a static set of data values, it’s always possible to construct a hash function designed specifically for them, which will never collide with itself (of course, size of its output will be at least
log(|data set|). However, it requires you to know all the possible data values ahead of time. This is called perfect hashing.That being said, here are a few alternatives which should get you started (they are designed to minimize collisions)