I have the below file formatted in the fashion shown below –
DATA THROUGH 2001 YRS JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC ANN
BIRMINGHAM AL 58 8.1 8.7 9.0 8.2 6.8 6.0 5.7 5.4 6.3 6.2 7.2 7.7 7.1
HUNTSVILLE AL 34 9.1 9.4 9.8 9.2 7.9 6.9 6.1 5.8 6.7 7.3 8.1 9.0 7.9
MOBILE AL 53 10.1 10.3 10.7 10.1 8.7 7.5 6.9 6.7 7.7 8.0 8.9 9.6 8.8
MONTGOMERY AL 57 7.7 8.2 8.3 7.3 6.1 5.8 5.7 5.2 5.9 5.7 6.5 7.1 6.6
ANCHORAGE AK 48 6.4 6.8 7.1 7.3 8.5 8.4 7.3 6.9 6.7 6.7 6.4 6.3 7.1
ANNETTE AK 37 11.7 11.9 10.6 10.7 9.0 8.5 7.8 8.1 8.9 11.4 11.8 12.0 10.2
BARROW AK 68 11.9 11.3 11.3 11.5 12.0 11.5 11.7 12.4 13.2 13.3 12.4 11.7 12.0
BARTER IS. AK 33 15.1 14.4 13.7 12.0 12.7 11.6 10.9 11.8 13.2 14.8 14.9 13.9 13.2
BETHEL AK 43 14.5 14.8 13.8 12.9 11.5 11.0 10.7 11.0 11.6 12.3 13.2 13.7 12.6
BETTLES AK 26 5.9 6.3 7.1 7.5 7.2 6.8 6.6 6.2 6.4 6.4 5.8 5.7 6.5
I need to calculate the variance for every state except alaska represented by AK .
The formula for the variance is : nƩ(x^2) -(Ʃ x^2) / n(n-1).
All I have now is some code like
BEGIN {FS ="\t"} {if ($2 != "AK")
{
sum1 = sum1 + ($4)^2
sum2 = sum2 + $4
n = n+1
if($2 != "AK")
{
sum1 = sum1 * n;
sum3 = sum1 - (sum2)^2
sum3 = sum3 / (n*(n-1));
}
}} END {}
BEGIN {FS ="\t"} {if ($2 != "AK")
{
sum1 = sum1 * n;
sum3 = sum1 - (sum2)^2
sum3 = sum3 / (n*(n-1));
}}
END {print "average " sum3}
i get a division by zero error
I am short of ideas as to how to compute this formula . Any suggestions on breaking the code down – finding the pattersn and or complete code will be appreciated.
This sums the squares of column 10 when the state is not AK. Then uses your formula.