Numerical methods are generally rather hard to get right because of error propagation due to the limited precision of floats (or even doubles). This seems to be especially true with methods involving series, where a usually large number of ever diminishing terms must added to attain something that looks like convergence. Even fairly simple sequences such as

may be complex to evaluate. First, is cumbersome, and becomes small exceedingly rapidly.

So what can we do about it? The first idea is to use doubles instead of floats (or whatever floating number type offers more precision given the platform) but a better reflex would be to look for alternative representations that are better behaved. For example,

also computes , but converges about twice as fast. I guess you’re asking yourself where does that expression come from anyway. Fair enough, let’s see.

First, if we expand the first expression, we get

Let us now group terms two by two, we get, somewhere in the series terms of the form of

“Simplifying” the pair, we get

which you can readily verify using simple algebra. This yields the alternate series above (which has no immediate semblance to the first, while being strictly equivalent), with half the number of terms (an enumerably infinite number, but still converges twice as fast, as counted by sum iterations).

So, even if it should converge twice as fast, does it? And does it with increased precision or does it have worst error propagation? Well, let’s experiment! A C version gives:

float_x naive_e(int nb_terms) { // ok, not so naive float_x s=1; float_x n=1; for (int i=1;i<=nb_terms;i++) { n*=i; s+=1/n; } return s; } //////////////////////////////////////// float_x compressed_e(int nb_terms) { float_x s=2; float_x k_bang=1; for (int k=1;k<=nb_terms;k++) { float_x t=2*k; k_bang*=t*(t+1); s+=(t+2)/k_bang; } return s; }

With the functions parametrized by the number of iteration one is willing to give them to compute their magic. The actual value of is

Running both functions with an increasing number of iterations yields the following table:

it. Naive Compressed 1 1 2 2 2 2.666666666666667 3 2.5 2.716666666666666 4 2.666666666666667 2.718253968253968 5 2.708333333333333 2.718281525573192 6 2.716666666666666 2.718281826198492 7 2.718055555555555 2.718281828446758 8 2.718253968253968 2.718281828458994 9 2.718278769841270 2.718281828459045 10 2.718281525573192 2.718281828459045 11 2.718281801146385 2.718281828459045 12 2.718281826198493 2.718281828459045 13 2.718281828286169 2.718281828459045 14 2.718281828446759 2.718281828459045 15 2.718281828458230 2.718281828459045 16 2.718281828458995 2.718281828459045 17 2.718281828459043 2.718281828459045 18 2.718281828459046 2.718281828459045 19 2.718281828459046 2.718281828459045 20 2.718281828459046 2.718281828459045

where we see that the compressed version 1) gets its final value at iteration 9 (while the naive gets it at iteration 18) and that 2) it gets a better result than the naive version, with all significant digits right. Ok, so the converge being twice as fast is not exactly a surprise, but the increased precision might be. For one thing, the compressed version has a smaller magnitude difference between the numerators and denominators of the terms, making it better conditioned.

It is not clear that we always want to have an alternate expression for a series where the ratio of magnitudes of numerators and denominators is close to 1, but it is clear that it helps preventing terms going to zero (in a machine-specific way) too fast. doesn’t have to be very large for to be zero, as represented by floats, and goes to zero a bit slower.

We could have a look at another series, for example, , to see how compression affects it.

*To be continued…*

Hello,

That’s interesting. Now, performance-wise, you have for each iteration:

– for the naive: 1 integer multiplication, 1 float division, 1 float addition

– for the compressed: 3 integer multiplication, 2 int add, 1 float division, 1 float addition

Which means the compressed has 2 int mult and 2 int add more than the naive.

So, integrating this, would the compressed version still be faster?

“Faster” means here the number of iterations, not really the number of operations. Of course, what you say is right; it does need more operations per iterations, and on a in-order, non-superscalar CPU, it would probably be much slower per iteration. However, (and fortunately) most CPUs are superscalar and out-of-order, so as many instructions as possible are executed in parallel, and now it’s quite possible that the timings are comparable when we reach comparable precision.

And precision should be more important than speed (most of the times). The key point here is that vanishes to machine-precision much faster than, say, , and that’s what buys us extra precision.