Whatever sums your floats (Part II)

A while ago, Martin Capoušek drew my attention on Kahan’s summation algorithm and ask how it compared to the other methods I presented then.

Well, let’s find out.

Let’s recall. The basic problem is that since IEEE-754 floats are limited in precision, it’s hard to get the sum of a large number of values right. Of course, you get effects that you may think of as “rounding”, but it’s much worse than that. For example, you can have a+b==a, if b is much smaller than a. So in order to minimize rounding, you should always sum numbers of very comparable magnitude. And rounding errors propagate and may, or may not, cancel each other out.

Since they may not cancel each other out—rounding one time up, another time down; averaging zero on the long run—Kahan proposed an algorithm that keeps track of the rounding error, in a way that’s reminiscent of how codec deal with loss of precision due to quantization in differential PCM coding. The basic idea is to add a new value plus cumulative error to the current sum, resulting in a new sum, and subtracting back the old sum from it. Normally, with infinite precision, we’d get the new value plus the cumulative error back, but with limited precision, we should get the quantized/rounded/unprecise new value. This is the rounding error for the next round.

A minimal implementation:

float kahan_sum(const std::vector<float> & v)
 {
  float sum=0;
  float cumul_error=0;
  for (float vv: v)
   {
    float t1 = vv-cumul_error;
    float t2 = sum+t1;
    cumul_error=(t2-sum)-t1; // should be zero, or very close to.
    sum=t2;
   }

  return sum;
 }

*
* *

So, how does it compare to the other methods? If we rerun the same numbers as before, we get the following results:

as-is sorted pair sorted pair Kahan
7995938 7996406.5 7996777.5 7996777 7996777.5
8009019.5 8009517.5 8009789 8009789 8009789
8001316 8001631.5 8001917 8001917 8001916.5
8009463.5 8009750.5 8010206 8010206 8010206
7995282 7995591 7995851 7995852 7995851.5

It compares to pair and sorted pair, but without using a lot of extra memory and without sorting.

*
* *

A careful (and rather tedious) analysis of error propagation would show that the error, like for other methods, can be rather large. At worst, it can be as large as

\left(2\varepsilon+O(n\varepsilon^2)\right)\sum_{i=1}^{n}\left|x_i\right|,

where \varepsilon is the “machine precision”, something like 2^{-23}\approx{}1.19\times{10}^{-7} for floats.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: