I think the expression for a running mean/average in
iterative/pseudocode form would be something like:
x = new data value
sum = sum + x
count = count +1
running_mean = sum/count
repeat...
what would be the iterative/pseudocode expression for a running
variance?
by 'running' I mean that I have a list that I periodically append to,
and I cannot store all the previous values. for running mean, I don't
store x[1], x[2], etc. just thier sum and the count. a link to
straight answers like this would be appreciated. I get hundreds of
hits on every math topic I try, but no pages get right to the point of
stuff like the question above.
I'm under the impression that the answer to this is perfectly well
known and a knowledgable person would know off the top of their head
or would know right where to look. if you know you can provide the
answer, but it will require real work, give me some idea of the effort
required so I can price it better.
One quick point that I forgot to mention.
Java, like C and C++, has 0-based indexing in arrays; while your
pseudocode, like Fortran, Matlab, and a lot of pseudocode, uses
1-based indexing on arrays; in any case, this is the reason that 1 is
added to the denominator in the expressions for the mean and mean of
the squares in the code given. Phys. Rev. A 31, 2248 (1985): Bauche-Arnoult et al. - Variance of :: The complete expression of the variance a2 for the subarray 1NNj '-lN" is listed .. the summations now run over all eigenvectors a- jN+lqIJM) (9) b = [i! http://link.aps.org/doi/10.1103/PhysRevA.31.2248HOME |
Thank you for the question.
As always, if you have any questions or need additional information,
please use the "Request Clarification" button on the browser before
rating this answer.
The variance of a random variable X is the
E(X^2)-E(X)^2 .
(see e.g. http://mathworld.wolfram.com/Variance.html).
Therefore, a running variance can be computed simply by using running
means of the original data and the squares of the original data.
Below is a Java program that shows the implementation in a method
called "variances" that returns the running variances of an array of
data.
The test program generates an array of data from a uniform
distribution, whose variance should be 1/12, (
http://mathworld.wolfram.com/UniformDistribution.html ) .
Here is a sample script:
%javac Variance.java
% java Variance 100
Expected variance of: 0.08333333333333333 got variance of:
0.0840287590180841
% java Variance 1000
Expected variance of: 0.08333333333333333 got variance of:
0.08806610259355246
% java Variance 10000
Expected variance of: 0.08333333333333333 got variance of:
0.08249038698065755
% java Variance 1000000
Expected variance of: 0.08333333333333333 got variance of:
0.0833870225438409
Here is the code (note: sometimes code does not render properly. If
this is a problem, I can put it on a website).
public class Variance{
public static double variances(double data){
double sum=0;
double sumsquare=0;
int len=data.length;
double mean=0;
double variances=new double[len];
for (int i=0;i0)len=Integer.parseInt(args[0]);
doubledata=new double[len];
for (int i=0;i
I think the price might be a little too low.
Your analysis is correct. However, it is not a good idea to use the
symbol E for summation, since in probability theory E means
expectation, or mean (what you use double angle brackets for).
do you mean that it's a difficult question?
so, let me just see if I've got this right (expressed
as I'm used to seeing it)
running variance = 1/N * E(x^2) - ( E(x)/n )^2
(where E means summation)
or: v = - ^2
so I could use:
x = new data value
sum = sum + x
sumofsquares = sumofsquares + x^2
count = count +1
running_mean = sum/count
running_variance = 1/count * sumofsquares - running_mean^2
repeat...
?
The problem with
sum_var = sum_var + (x - running_mean)^2
is that running_mean changes at each iteration, so all of the previous
summands to sum_var would actually have to be re-calculated. The code
with these two lines added would still give a reasonable approximation
to the true variance at each stage, as long as the data come in an
unsorted order, but the approach given by rbnn gives the correct
variance at each stage. The sum of the squares is independent of the
order in which the data are given, and records enough information to
compute the variance because of linearity:
sum( (x_i - mean)^2 )
= sum( x_i^2 ) - sum( -2*x_i*mean) + sum(mean^2)
= sum(x_i^2) - 2*mean*sum(x_i) + mean*mean*count
= sum(x_i^2) - mean*mean*count
Would it work to just add a couple lines to your running mean code?
sum_var = sum_var + (x - running_mean)^2
running_var = sum_var/count
How much does getting a small tattoo on your hip/stomach hurt?
Do anyone else have an itchy anus? ?
|