I am implementing a memcached client library. I want it to support several servers and so I wish to add some load-balancing system.
Basically, you can do two operations on a server:
- Store a
valuegiven itskey. - Get a
valuegiven itskey.
Let us say I have N servers (from 0 to N - 1), I’d like to have a repartition function which, from a given key and server number N, would give me an index in the [0, N[ range.
unsigned int getServerIndex(const std::string& key, unsigned int serverCount);
The function should be as fast and simple as possible and must respect the following constraint:
getServerIndex(key, N) == getServerIndex(key, N); //aka. No random return.
I wish I could do this without using an external library (like OpenSSL and its hashing functions). What are my options here?
Side notes:
Obviously, the basic implementation:
unsigned int getServerIndex(const std::string& key, unsigned int serverCount)
{
return 0;
}
Is not a valid answer as this is not exactly a good repartition function 😀
Additional information:
Keys will usually be any possible string, within the ANSI charset (mostly [a-zA-Z0-9_-]). The size may be anything from a one-char-key to whatever-size-you-want.
A good repartition algorithm is an algorithm for which the probability of returning a is equal (or not too far) from the probability of returning b, for two different keys. The number of servers might change (rarely though) and if it does, it is acceptable that the returned index for a given key changes as well.
You’re probably looking for something that implements consistent hashing. The easiest way to do this is to assign a random ID to each memcache server, and allocate each item to the memcache server which has the closest ID to the item’s hash, by some metric.
A common choice for this – and the one taken by distributed systems such as Kademlia – would be to use the SHA1 hash function (though the hash is not important), and compare distances by XORing the hash of the item with the hash of the server and interpreting the result as a magnitude. All you need, then, is a way of making each client aware of the list of memcache servers and their IDs.
When a memcache server joins or leaves, it need only generate its own random ID, then ask its new neighbours to send it any items that are closer to its hash than to their own.