Trigonometric Tables Reconsidered

A couple of months ago (already!) 0xjfdube produced an excellent piece on table-based trigonometric function computations. The basic idea was that you can look-up the value of a trigonometric function rather than actually computing it, on the premise that computing such functions directly is inherently slow. Turns out that’s actually the case, even on fancy CPUs. He discusses the precision of the estimate based on the size of the table and shows that you needn’t a terrible number of entries to get decent precision. He muses over the possibility of using interpolation to augment precision, speculating that it might be slow anyway.

I started thinking about how to interpolate efficiently between table entries but then I realized that it’s not fundamentally more complicated to compute a polynomial approximation of the functions than to search the table then use polynomial interpolation between entries.

The classical textbook example is to use a Taylor Series expansion of the function around the point of interest, let us say a. For \cos, it leads to quite a mess:

\cos(x-a)=\frac{\cos(a)}{a}+\frac{-\cos a - a \sin a}{a^2}(x-a)+\left(\frac{\cos a}{a^3}-\frac{\cos a}{2a} + \frac{\sin a}{a^2}\right)(x-a)^2+\cdots

Not only does it require to know \cos a (and \sin a) for a suitable a near the x we want, the expression doesn’t look easily factorizable. If we rather dispense of a and expand around zero, we get a McLaurin Series, the special case for Taylor Series when a=0. For cos, we find that:

\displaystyle \cos x=\sum_{i=0}^\infty \frac{(-1)^i}{(2i)!}x^{2i}

While this expression seems to be requiring the calculation of large powers and factorials, a bit a factoring leads a much nicer expression involving only small products and squaring. Indeed:

\displaystyle \cos x=1-\frac{x^2}{2!}+\frac{x^4}{4!}-\frac{x^6}{6!}+\cdots

which can be further factorized:

\displaystyle = 1-\frac{x^2}{2}\left(1-\frac{x^2}{4\times{3}}\left(1-\frac{x^2}{6\times{5}}\left(\cdots\right)\right)\right)

where \cdots means an infinite number of terms. This expansion means that the n-th nested term is given by

\displaystyle \frac{x^2}{(2n)(2n-1)}

which doesn’t look all that bad. Furthermore, since it’s the same x^2 everywhere, you can factor it and compute it once. A straightforward implementation would look something like:

inline float taylor(float x, int n)
 {
  const float x2=x*x;
  float s=1.0f;

  while (n)
   {
    const float n2=2.0f*n;
    s=1.0f-x2/( n2*(n2-1.0f))*s;
    n--;
   }

  return s;
 }

One may think of using different decompositions/approximations. In [1], we find the series

\displaystyle \cos z=\prod_{k=1}^\infty\left(1-\frac{4z^2}{(2k-1)^2\pi^2}\right)

but we will see that it doesn’t converge fast enough (with a rather small number of terms) to be interesting.

Since \cos is a rather smooth function, one may think that it can be approximated with a rather low-order polynomial. Fitting a regular degree 4 polynomial, we obtain the approximation

\cos x \approx a_0+a_1 x^2 + a_2 x^4

where

a_0 \approx 0.999579,

a_1 \approx -0.496381,

and

a_2 \approx 0.0372019,

which I found using the Fit function in Mathematica against \cos x with 0 \leqslant x \leqslant \frac{\pi}{2}. The terms for x and x^3 are exceedingly small; enough so to be ignored. Why just 0 \leqslant x \leqslant \frac{\pi}{2}? Because:

Taylor Series Approximations to Cosine

because the Taylor expansion does very well on the range 0 \leqslant x \leqslant \frac{\pi}{2}. In the image above: black is the true cosine, red is a Taylor Series with only two terms, orange with three, and green with seven. The purple line is the product expansion discussed above with something like 12 terms—as I said, it converges extremely slowly. So it seems reasonable to compare on the range 0 \leqslant x \leqslant \frac{\pi}{2}. And this is not that much of a limitation, as we will discuss later on.

The low-degree polynomial function is given by

float poly(float x, int)
 {
  const float x2=x*x;
  return 0.999579f - 0.496381f * x2 + 0.0372019f*x2*x2;
 }

(note the f suffix used to avoid the promotion of constants to double.).

*
* *

What about speed? Are they faster than the <math.h> cos function? Fortunately, some are. For 1 million calls:

Method Terms Elapsed Per Call
EmptyLoop 0.008891s 8.891e-09
math.h 0.028873s 2.8873e-08s
Polynomial 0.011190s 1.119e-08s
Taylor 2 0.022102s 2.2102e-08s
Taylor 3 0.030911s 3.0911e-08s
Taylor 7 0.068710s 6.871e-08s

So at three terms, already, the Taylor series is slower than the CPU/ <math.h> cos function. The polynomial approximation does rather well. Examining its assembly:

<_Z4polydi>:
0f 28 c8                movaps xmm1,xmm0
f3 0f 10 15 d1 07 00 00 movss  xmm2,DWORD PTR [rip+0x7d1]
f3 0f 59 c8             mulss  xmm1,xmm0
f3 0f 10 05 c9 07 00 00 movss  xmm0,DWORD PTR [rip+0x7c9]
f3 0f 59 d1             mulss  xmm2,xmm1
f3 0f 5c c2             subss  xmm0,xmm2
f3 0f 10 15 bd 07 00 00 movss  xmm2,DWORD PTR [rip+0x7bd]
f3 0f 59 d1             mulss  xmm2,xmm1
f3 0f 59 d1             mulss  xmm2,xmm1
f3 0f 58 c2             addss  xmm0,xmm2
c3                      ret

we see that g++ (4.6.1, for this experiment) uses SSE instructions to perform float arithmetic (it turns out that if I force 387-style instructions only, it’s slightly faster, but, meh). The code is short (10 instructions) and doesn’t use prefixes (with double, each instruction would have a 66 prefix, which seems to be more costly to decode, in addition of taking longer to execute with double-precision arithmetic. The memory references are to the 3 constants.

*
* *

What about precision? The maximum error is probably what interests us the most. The maximum error for the different methods are summarized in the following table:

Method Terms Max Error
Taylor 2 0.0198623
Taylor 3 0.0008873
Taylor 7 1.07e-7
Polynomial 0.0012689

So Taylor with three terms, which compares more or less with the CPU/library cos function in terms of time, has a relatively high error O(10^{-3}) (yes, yes, I know, big-O notation isn’t for constants). Which is only a bit smaller than the much faster polynomial approximation, which also has an error in the same O(10^{-3}) range.

*
* *

What about the range 0 \leqslant x \leqslant \frac{\pi}{2}? Well, that can be most easily arranged when we observe that the complete range is in fact replications/rotation of the function on 0 \leqslant x \leqslant \frac{\pi}{2}. Handwaving-ly:

Le badly drawn cos still shows that each colored section is a translation/mirror of the first one. This is exploitable. If f is a function approximating cosine (or sine) on [0,\pi/2], the following function suffices to cover [0,2\pi]:

float mirror(cos_func f, int n, float x)
 {
  if (x<M_PI/2.0)
   return f(x,n);
  else if (x<M_PI)
   return -f(M_PI-x,n);
  else if (x<3*M_PI/2)
   return -f(x-M_PI,n);
  else
   return f(2*M_PI-x,n);
 }

*
* *

So, what to conclude? My conclusion is to use the compiler-provided cos function call and not worry about it, most of the times. Unless your CPU can do floats but can’t do cos. In which case, the answer depends on the precision you need. Taylor series seems reasonably efficient (especially once factored properly, although this is more or less hand-waving since I didn’t test an implementation where x^n and n! are computed) and precise enough; with three terms the error is O(10^{-3}), just a bit smaller than the polynomial approximation.

*
* *

Abramowitz & Stegun give slightly different constants for the polynomial approximation and claim O(10^{-4}) error. Not sure how they arrived at the slightly different constants. I would guess Mathematica is fairly accurate on least-mean-squares fits. Maybe their constants were minimized using L_1?

*
* *

I am sure we can find other very good approximations involving only inexpensive operations. But the very notion of “inexpensive operation” will limit the class of functions we can explore for approximations. If one restricts himself to addition (and therefore subtraction) and multiplication, we pretty much consider only polynomials, with the special case of linear equations. If we allow division, we have the rational functions. As we keep adding more allowable operations, the class of possible functions grows, but the complexity may exceed the original function, the one we’re trying to avoid to compute exactly.


[1] Milton Abramowitz, Irene A. Stegun — Handbook of Mathematical Functions: with Formulas, Graphs, and Mathematical Tables Dover, 1965

One Response to Trigonometric Tables Reconsidered

  1. […] a couple of different occasions I discussed the topic of interpolation, without really going into the details. Lately, I had to […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: