I have an environment that serves many devices spread across 3 time zones by receiving and sending data during the wee hours of the night. The distribution of these devices was determined pseudo-randomly based on an identification number and a simple calculation using a modulo operation. The result of such a calculation creates an unnecessary artificial peak which consumes more resources than I’d like during certain hours of the night.
As part of our protocol I can instruct devices when to connect to our system on subsequent nights.
I am seeking an algorithm which can generally distribute the peak into a more level line (albeit generally higher at most times) or at least a shove in the right direction – meaning what sort of terminology should I spend my time reading about. I have available to me identification numbers for devices, the current time, and the device’s time zone as inputs for performing calculation. I can also perform some up front analytical calculations to create pools from which to draw slots from, though I feel this approach may be less elegant than I am hoping for (though a learning algorithm may not be a bad thing…).
(Ultimately and somewhat less relevant I will be implementing this algorithm using C#.)
If you want to avoid the spikes associated with using random times, look at the various hashing functions used for hashtables. Your reading might start at the wikipedia articles on the subject:
http://en.wikipedia.org/wiki/Hash_function
Basically, divide whatever you want your update window to be into the appropriate number of buckets. One option might be 3 hours * 60 minutes * 60 seconds = 10800 buckets. Then use that as your hashtable size, for the chosen hashing function. Your unique input might be device ID. Don’t forget to use GMT for the chosen time. Your programming language of choice probably has a number of built in hashing functions, but the article should provide some links to get you started if you want to implement one from scratch.
This approach is superior to the earlier answer of random access times because it has much better evenness properties, and ensures that your access patterns will be approximately flat, as compared to the random function which is likely to sometimes exhibit spikes.
Here’s some more specific information on how to implement various functions:
http://www.partow.net/programming/hashfunctions/index.html