An array shall be sorted high to low by its values.
<?php
$items = array(
1 => f(1),
2 => f(2),
3 => f(3),
4 => f(4),
5 => f(5),
);
?>
After sorting I look which item 1, 2, 3, 4, 5 is the first one. I try that again and again and again.
Afterwards
- 5 should be the first item five times more than 1
- 4 should be the first item four times more than 1
- 3 should be the first item three times more than 1
- 4 should be the first item two times more than 2
- …
One idea is
<?php
function f(key) {
return key / random();
}
?>
which, for 1’000’000 tries resulted in
key | times on top | ratio with key one | expected ratio
----+--------------+--------------------+---------------
5 | 374'365 | 6.75 | 5
4 | 267'863 | 4.83 | 4
3 | 185'707 | i am so lazy ... | 3
2 | 116'618 | | 2
1 | 55'447 | 1 | 1
Looks wierd to me, but maybe
- there is a simple problem with f?
- there is a better f?
My implementation:
<?php
abstract class Test {
private $result;
protected abstract function f($x);
protected function iteration() {
$values = array(
1 => $this->f(1),
2 => $this->f(2),
3 => $this->f(3),
4 => $this->f(4),
5 => $this->f(5),
);
arsort($values);
$top = key($values);
if (!isset($this->result[$top])) {
$this->result[$top] = 1;
} else {
$this->result[$top]++;
}
}
public function run($iterations) {
$this->result = array();
for($i = 0; $i < $iterations; $i++) {
$this->iteration();
}
arsort($this->result);
return $this->result;
}
}
class MyTest extends Test {
protected function f($x) {
return $x / rand();
}
}
$test = new MyTest();
$result = $test->run(1000 * 1000);
print_r($result);
printf("Ratio of key 5 to 1, which should be 5: %f\n", $result[5] / $result[1]);
?>
I have tried a billion rounds. But again the ratio is 6.75 – the whole point is: why isn’t it five?
The results for
<?php
class BetterRandomGeneratorTest extends Test {
protected function f($x) {
return $x / mt_rand();
}
}
?>
are
Array
(
[5] => 3742816
[4] => 2674352
[3] => 1861444
[2] => 1168333
[1] => 553055
)
Ratio of key 5 to 1: 6.767529
Here is a simple f which will do it.
This is guaranteed to work because the max is equally likely to be any of the 15 random numbers chosen, and 1/3 of the time that number will be in
f(5), versus 1/15 forf(1).As for what was wrong with your
f, it is quite simple. Your solution has the nice symmetry that exactly 80% of the time,f(1) < f(5). Howeverf(1)tends to be bigger thanf(5)whenf(1)is larger than average andf(5)is smaller than average. Ditto forf(2),f(3)andf(4). However it is unusual for all off(2), ... f(5)to be small at once. This causes correlations that causef(1)to be the largest less often than you would naively think. Vice versa correlations tend to come out in favor off(5)more often than you would naively think.If you want to compute the exact probabilities of each number coming out on top, it shouldn’t be too hard to compute exact answers with integration. The idea is that you integrate from 0 to 1 the probability that, if that was the value of
random()forf(i)thatf(i)is the maximum. (So, for instance, for 5 you would integrate(1-x/5)(1-x/4)(1-x/3)(1-x/2)while for 1 you would integrate a function that is 0 ifrandom()is bigger than 0.2, and otherwise is(1-2x)(1-3x)(1-4x)(1-5x).) The expressions will be complicated, and the ratios won’t come out to nice answers.