For an educational website, it is my purpose to let students fool around somewhat with value series and their collelation. For instance, students can enter two arrays for which the correlation is calculated:
$array_x = array(5,3,6,7,4,2,9,5);
$array_y = array(4,3,4,8,3,2,10,5);
echo Correlation($array_x, $array_y); // 0.93439982209434
The code for this works perfectly and can be found at the bottom of this post. I’m however now facing a challenge. What I want is the following:
- student inputs a $array_x (5,3,6,7,4,2,9,5)
- student inputs a correlation (0.9)
- student inputs the boundaries of $array_y (for instance, between 1 and 10 or between 50 and 80)
- the script returns a random array (for instance: 4,3,4,8,3,2,10,5) which has (about) the given correlation
So, in other words, the code would have to work like:
$array_x = array(5,3,6,7,4,2,9,5);
$boundaries = array(1, 10);
$correlation = 0.9;
echo ySeries($array_x, $boundaries, $correlation); // array(4,3,4,8,3,2,10,5)
At the Stackexchange Math forum, @ilya answered (inserted as an image, since Latex formatting of fomulas don’t seem to work on stackoverflow):

P.S. The code used to calculate the correlation:
function Correlation($arr1, $arr2) {
$correlation = 0;
$k = SumProductMeanDeviation($arr1, $arr2);
$ssmd1 = SumSquareMeanDeviation($arr1);
$ssmd2 = SumSquareMeanDeviation($arr2);
$product = $ssmd1 * $ssmd2;
$res = sqrt($product);
$correlation = $k / $res;
return $correlation;
}
function SumProductMeanDeviation($arr1, $arr2) {
$sum = 0;
$num = count($arr1);
for($i=0; $i < $num; $i++) {
$sum = $sum + ProductMeanDeviation($arr1, $arr2, $i);
}
return $sum;
}
function ProductMeanDeviation($arr1, $arr2, $item) {
return (MeanDeviation($arr1, $item) * MeanDeviation($arr2, $item));
}
function SumSquareMeanDeviation($arr) {
$sum = 0;
$num = count($arr);
for($i = 0; $i < $num; $i++) {
$sum = $sum + SquareMeanDeviation($arr, $i);
}
return $sum;
}
function SquareMeanDeviation($arr, $item) {
return MeanDeviation($arr, $item) * MeanDeviation($arr, $item);
}
function SumMeanDeviation($arr) {
$sum = 0;
$num = count($arr);
for($i = 0; $i < $num; $i++) {
$sum = $sum + MeanDeviation($arr, $i);
}
return $sum;
}
function MeanDeviation($arr, $item) {
$average = Average($arr);
return $arr[$item] - $average;
}
function Average($arr) {
$sum = Sum($arr);
$num = count($arr);
return $sum/$num;
}
function Sum($arr) {
return array_sum($arr);
}
So, here’s the php implementation of your algorithm that uses Dawkins’ weasel to reduce the error gradually until the desired threshold.