I’m looking for an efficient way to hash a 6-byte field so that it can be used for std::unordered_map .
I think this would be the conventional way of creating a hash:
struct Hash {
std::size_t operator()(const std::array<uint8_t, 6> & mac) const {
std::size_t key = 0;
boost::hash_combine(key, mac[0]);
boost::hash_combine(key, mac[1]);
boost::hash_combine(key, mac[2]);
boost::hash_combine(key, mac[3]);
boost::hash_combine(key, mac[4]);
boost::hash_combine(key, mac[5]);
return key;
}
};
However I noticed that I can make it a little faster (~20%) using this trick:
struct Hash {
std::size_t operator()(const std::array<uint8_t, 6> & mac) const {
std::size_t key = 0;
// Possibly UB?
boost::hash_combine(key, reinterpret_cast<const uint32_t&>(mac[0]));
boost::hash_combine(key, reinterpret_cast<const uint16_t&>(mac[4]));
return key;
}
};
And this was even faster:
struct Hash {
std::size_t operator()(const std::array<uint8_t, 6> & mac) const {
// Requires size_t to be 64-bit.
static_assert(sizeof(std::size_t) >= 6, "MAC address doesn't fit in std::size_t!");
std::size_t key = 0;
// Likely UB?
boost::hash_combine(key, 0x0000FFFFFFFFFFFF & reinterpret_cast<const uint64_t&>(mac[0]));
return key;
}
};
My question is two-fold:
- Are these optimizations going to result in UB?
- Is my first solution the way to go? Or is there a better way?
Your optimizations are breaking the strict aliasing rules, which leads (standardly speaking) to undefined behavior.
The last optimization worries me the most since you are essentially reading memory you ought not to, which may provoke traps if this memory happened to be protected.
Any reason you are not using
boost::hash_range?Since
boost::hash_rangeturns out not to be as fast as required, I would propose another solution, based on aliasing. Or rather, two solutions in one.The first idea is that aliasing can be subdued using
char*as a temporary type.is therefore a valid implementation of the hash.
However, we can go one step further. Because of alignment and padding, storing a
char[6]andchar[8]are likely to use the same amount of memory within a map node. Therefore, we could enrich the type, by usingunion:Now, you can encapsulate this properly within a class (and make sure you always initialize the bytes
7and8to0), and implement the interface ofstd::array<unsigned char, 6>that you actually need.I’ve used a similar trick for tiny strings (below 8 characters) for hashing and fast (non-alphabetic) comparisons and it’s really sweet.