I’m trying to generate an even distribution of random numbers based on User IDs. That is, I want a random number for each user that remains the same any time that user requests the random number (but the user doesn’t need to store the number). My current algorithm (in PHP) to count distribution, for a given large array of userIDs $arr is:
$range = 100;
$results = array_fill(0, $range, 0);
foreach ($arr as $userID) {
$hash = sha1($userID,TRUE);
$data = unpack('L*', $hash);
$seed = 0;
foreach ($data as $integer) {
$seed ^= $integer;
}
srand($seed);
++$results[rand(0, $range-1)];
}
One would hope that this generates an approximately even distribution. But it doesn’t! I’ve checked to make sure that each value in $arr is unique, but one entry in the list always gets much more activity than all the others. Is there a better method of generating a hash of a string that will give an approximately even distribution? Apparently SHA is not up to the job. I’ve also tried MD5 and a simple crc32, all with the same results!?
Am I crazy? Is the only explanation that I have not, in fact, verified that each entry in $arr is unique?
mt_rand()should have a very even distribution over the range requested. When users are created, create a random seed for that user usingmt_rand()then alwaysmt_srand()with that seed for that user.To get an even distribution from 0 to 99, as your example, just
mt_rand(0,$range-1). Doing tricks with sha1, md5, or some other hashing algorithm won’t really give you a more even distribution than straight random.