Padé Approximants

Sometimes, you need to compute a function that’s complex, cumbersome, or which your favorite language doesn’t quite provide for. We may be interested in an exact implementation, or in a sufficiently good approximation. For the later, I often turn to Abramovitz and Stegun’s Handbook of Mathematical Function, a treasure trove of tables and approximations to hard functions. But how do they get all these approximations? The rational function approximations seemed the most mysterious of them all. Indeed, how can one come up with

e^x \approx 1-0.9664x+0.3536x^2

for e^x, for 0\leqslant x\leqslant\ln 2?

Well, turns out, it’s hard work, but there’s a twist to it. Let’s see how it’s done!

The first step is to find the Maclaurin (around zero) or the Taylor (around some number u) expansion of the function, f(x), you want to approximate. The Taylor expansion of a function f(x) around u is


First, you must remark that the terms \displaystyle\frac{f^{(i)}(u)}{i!} are constant, because they do not depend on x. Let’s call them the \hat{c}_i. Then, the (x-u)^i are only polynomials, because u is also a constant. With much work, you can rearrange the whole thing, gather all the \hat{c}_j constants together that goes with an x^i, and write them as

\displaystyle f(x)=\sum_{i=0}^\infty c_i x^i.

So basically: every function for which you can find a Taylor series, you can rewrite as a power series. It is this power series that we will approximate by a polynomial, or, more exactly, a rational function: a quotient of two polynomials.

Infinite sums and series are rather cumbersome, and we will want to cut it so a certain precision. Therefore, we choose some k that will get us good precision (that’ll obviously dependent on the function) and write

\displaystyle f(x)\approx\sum_{i=0}^k c_i x^i.

If k is well chosen and f(x) somewhat well behaved, the approximation error due to the truncation of the series will be small, or at least not objectionable for our application (you may need to experiment to find the k that’s good for you and your function). We will approximate the truncated series by

\displaystyle \sum_{i=0}^k c_i x^i=\frac{\sum_{i=0}^m a_i x^i}{\sum_{i=0}^n b_i x^i},

where the a_i and the b_j are for us to find. k, m and n are constrained by k=m+n. This will allow us to get m+n+1 equations in m+n+2 unknowns—but we will be able to solve it anyway! Rewriting the above as

\displaystyle \left(\sum_{i=0}^k c_i x^i\right)\left(\sum_{i=0}^n b_i x^i\right)=\left(\sum_{i=0}^m a_i x^i\right) (*),

we will, by gathering the members by powers of x, be able to get m+n+1 equations of the form

\displaystyle c_k \left(\sum_{j=0}^{\min(k,n)} b_{k-j}x^j\right)=a_k x^k

with a_j=0 if j>m. Then use these m+n-1 equations to solve for all the b_i and the a_i. This will get our solution. OK, let’s go through a worked-out example.

* *

Let’s choose an easy function, say e^x. We find that, around zero, the Taylor series is

\displaystyle e^x=1+x+\frac{1}{2!}x^2+\frac{1}{3!}x^3+\cdots

If we want m=3 and n=2 (let’s choose smallish numbers, otherwise we’ll still be here next week!), then we keep k=m+n terms. We write:

\displaystyle 1+x+\frac{1}{2!}x^2+\frac{1}{3!}x^3=\frac{a_0+a_1x+a_2x^2}{b_0+b_1x}.

By rearranging terms as in eq. (*), and carrying out the left hand side product, we get






Gathering powers together, we arrive at the set of equations





and all other equations are =0, and we ignore them. We know all the c_i, they’re given by the truncated Taylor series. Then fun begins. We have m+n+1 equations but m+n+2 unknowns, so we know that the solution won’t be unique, unless we decide on the value of one of the unknown, or decide to express the solution in terms of this one unknown. We could choose, say b_0 to be the “known unknown” and go through the solution as if b_0 is a constant rather than an unknown. This process is tedious, but it’s only normal substitution/elimination.


We arrive at:

\displaystyle b_1=-\frac{1}{3}b_0

\displaystyle a_0=b_0

\displaystyle a_1=\frac{2}{3}b_0

\displaystyle a_2=\frac{1}{6}b_0

Letting b_0=1 (because, why not), we get the approximation:

\displaystyle \hat{f}(x)=\frac{1+\frac{2}{3}x+\frac{1}{6}x^2}{1-\frac{1}{3}x}

We’re done!

* *

How does it look? We didn’t use very many terms and the Taylor series is around zero. Let’s use, say, Mathematica, and plot the difference:


We see that on the interval ]-1,1[ the error is very small. Of course, you can’t tell from the graph, but the maximum error between -0.5 and 0.5 is 0.0012, and goes a bit bigger on ]-1,1[ at about 0.016. With very few terms, that’s rather good. Want better precision? Just pick a larger m and n and have fun.

2 Responses to Padé Approximants

  1. David Binner says:

    Hello, Professor Pigeon.

    I found this post when searching for information about Pade approximations. Very informative post. Thorough and easy to understand. (Have you ever thought about adding some of this information to Wikipedia? Their page on this topic could use some fleshing out.)

    I am about to start working with Pade approximants and may even make a blog post of my own to share my work. At this point, it’s not planned to be a derivation; I am working with data points and am hoping that creating Pade approximants with the data might provide SOME insight as to what kind of function could be used to define the data.

    In other words, given a data set produced by some as-yet unknown function, does a Pade approximation provide ANY direction as to what function generated the data? That is the investigation I am about to start. Hopefully, it doesn’t lead me down a dead end.

    • Padé approximation doesn’t work all that well in the more general Least Mean Square paradigm. As the post explains, it basically approximates a truncated power series (which in this case is only a polynomial), while trying to inhibit the polynomial wiggles by using a more expressive function class, rational functions. Another limitation of this method is that it uses the Taylor series of a function around some point, which means that the convergence radius will be limited. That’s why people have looked into solutions based on more complex basis functions, such as splines and wavelets.

      In any case, it doesn’t really tell you anything about the original function, it merely expects the function to be approximated reasonnably well by a truncated power series. And it if doesn’t, well, though luck ¯\_(ツ)_/¯.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: