Recently, I was curious how hash algorithms for floating points worked, so I looked at the source code for boost::hash_value. It turns out to be fairly complicated. The actual implementation loops over each digit in the radix and accumulates a hash value. Compared to the integer hash functions, it’s much more involved.
My question is: why should a floating-point hash algorithm be any more complicated? Why not just hash the binary representation of the floating point value as if it was an integer?
Like:
std::size_t hash_value(float f)
{
return hash_value(*(reinterpret_cast<int*>(&f)));
}
I realize that float is not guaranteed to be the same size as int on all systems, but that sort of thing could be handled with a few template meta-programs to deduce an integral type that is the same size as float. So what is the advantage of introducing an entirely different hash function that specifically operates on floating point types?
Why are you wanting to hash floating point values? For the same reason that comparing floating point values for equality has a number of pitfalls, hashing them can have similar (negative) consequences.
However given that you really do want to do this, I suspect that the boost algorithm is complicated because when you take into account denormalized numbers different bit patterns can represent the same number (and should probably have the same hash). In IEEE 754 there are also both positive and negative
0values that compare equal but have different bit patterns.This probably wouldn’t come up in the hashing if it wouldn’t have come up otherwise in your algorithm but you still need to take care about signaling NaN values.
Additionally what would be the meaning of hashing +/- infinity and/or NaN? Specifically NaN can have many representations, should they all result in the same hash? Infinity seems to have just two representations so it seems like it would work out ok.