Algorithms for calculating variance

   

In statistics, a formula for calculating the variance of a population of size n is:

<math>\mathrm{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}{n^2}<math>

A formula for calculating the unbiased estimation of the population variance from n finite samples is:

<math>\mathrm{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}{n(n-1)}<math>

The method of calculation may be more easily understood from the table below where the mean is 8.

i xi xi − mean (xi − mean)2
(index) (datum) (deviation) (squared deviation)
1 5 −3 9
2 7 −1 1
3 8 0 0
4 10 2 4
5 10 2 4
n = 5 sum = 40 0 18
  • mean = 40/5 = 8
  • variance = (5 · 338 − 402)/(5 · 4) = 4.5
  • standard deviation = <math>\sqrt{\mathrm{Variance}} = 2.12<math>

Note: Details of the variance calculation:

338 = [52 + 72 + 82 + 102 + 102]
40 = [5 + 7 + 8 + 10 + 10]

Algorithm

Therefore a simple algorithm to calculate variance can be described by the following pseudocode:

double sum;
double sum_sqr;
double variance;
long n = data.length; // the number of elements in the data array (the actual syntax is language-specific)

for i = 1 to n
 sum += data[i];
 sum_sqr += ( data[i] * data[i] );
end for

variance = ((n * sum_sqr) - (sum * sum))/(n*(n-1));

Algorithm

Another algorithm which avoids large numbers in sum_sqr while summing up

double avg = 0;
double var = 0;
long n = data.length; // number of elements

for i = 1 to n
 avg = (avg*i + data[i]) / (i + 1);
 var = (var * (i - 1) + (data[i] - avg)*(data[i] - avg)) / i;
end for

return var; // resulting variance


Retrieved from "http://www.centipedia.com/articles/Algorithms_for_calculating_variance"

This page has been accessed 868 times. This page was last modified 03:32, 12 Nov 2004. All text is available under the terms of the GNU Free Documentation License (see Copyrights for details).