In the last four installments of this series, we have seen linear interpolation, cubic interpolation, Hermite splines, and lastly cardinal splines.

In this installment (which should be the last of the series; at least for a while), let us have a look at how we can implement these interpolation efficient.

Let us consider cubic interpolation. We know that both Hermite and cardinal spline can be computed in essentially the same way cubic interpolation is computed, so we do not lose generality here. The matrix differ, but not the general method.

Cubic interpolation is done by solving a system of four equations in four unknowns, each equation corresponding to the polynomial valuated at a specific position. Solving is done by inverting the resulting Vandermonde matrix (re-read the cubic interpolation entry if need be).

The Vandermonde matrix, for the special case where , , , and (which is also without loss of generality, surprisingly), is given by:

and its inverse by

To solve for the coefficients of the cubic polynomial , we compute:

A direct implementation in C/C++ would look something like:

//////////////////////////////////////// // // The Vandermonde matrix for the cubic // polynomial (with special case x=-1,0,1,2) // is // // | -1 1 -1 1 | // | 0 0 0 1 | // | 1 1 1 1 | // | 8 4 2 1 | // // and its inverse is // // | -1 3 -3 1 | // (1/6) | 3 -6 3 0 | // | -2 -3 6 -1 | // | 0 6 0 0 | // float_x cubic_interpolation( float_x y_0, float_x y_1, float_x y_2, float_x y_3, float_x x) { float_x a = ( - y_0 + 3*y_1 - 3*y_2 + y_3)/6; float_x b = ( 3*y_0 - 6*y_1 + 3*y_2 )/6; float_x c = (-2*y_0 - 3*y_1 + 6*y_2 - y_3)/6; float_x d = y_1; // Horner! return ((a*x+b)*x+c)*x+d; }

Which is the direct application of the matrix formula to solve for the coefficients, and evaluate the polynomial at a given (with the polynomial being computed using Horner’s method, which reduces the cost of computing the polynomial quite a lot).

This implementation is not very efficient because 1) it solves the system all over again for every (it should solve the system and evaluate the polynomial separately) and 2) it does not take advantage of a matrix/vector or matrix/matrix multiply using a high-performance library such as, say, Atlas or OpenBlas.

In fact, we could take maximal advantage of such libraries by solving not only one, but many systems at the same time. How can we do this?

We find the parameters by computing for some values then we solve another system by computing , for another set of values . Now, we can notice that a matrix/matrix product works “by column”. Indeed, if we compute the product for two conformant matrices and , we can break the product in a series of matrix/vector products, where we multiply one column of by the whole matrix . In other words, if , then the th column of is given by times the th column of .

This means that the columns of can be computed independently (so, no, I’m not going to tell you that you can compute them in parallel, which is true, but that’s not the point I want to make). Therefore, the matrix can be seen as the concatenation of the s. OK, let us have a look at a matlab/octave experiment to show this property:

octave:1> m=[-1,1,-1,1;0,0,0,1;1,1,1,1;8,4,2,1] m = -1 1 -1 1 0 0 0 1 1 1 1 1 8 4 2 1 octave:2> mm=inverse(m) mm = -0.16667 0.50000 -0.50000 0.16667 0.50000 -1.00000 0.50000 0.00000 -0.33333 -0.50000 1.00000 -0.16667 0.00000 1.00000 0.00000 0.00000 octave:3> y1=[1;2;3;4] y1 = 1 2 3 4 octave:4> y2=[3;5;7;2] y2 = 3 5 7 2 octave:5> mm * y1 ans = 0 0 1 2 octave:6> mm * y2 ans = -1.16667 0.00000 3.16667 5.00000 octave:7> mm * horzcat(y1,y2) ans = -0.00000 -1.16667 0.00000 0.00000 1.00000 3.16667 2.00000 5.00000

where the `horzcat` (naturally) concatenates horizontally two matrix/vectors objects.

So we see that we can solve simultaneously many systems by bundling them into a single matrix/matrix product. In general, if we have patches to interpolate between points , the bundled system will look like

where the coefficients are indexed by the patch number. We also put that for the first patch and that for the last patch. You can devise other strategies to fill non-existent values, but in general simply putting zero there is a bad idea.

*

* *

You can use a similar trick for evaluating many points in a single matrix/matrix product. We can rewrite the whole interpolation for a patch by computing a product , where is a matrix whose th row is given by (where is a point lying somewhere between two of the original points, say , and where is a vector containing the coefficients of the cubic solved around . This yields , a (column) vector containing all the interpolated points on the patch.

*

* *

So this seems an awfully complicated way to optimize interpolation. But we save on many levels. First, we save the overhead of calling the function that solves for each section and the overhead of calling the function that evaluate the polynomial. Second, a good matrix/matrix multiply will be able to vectorize (i.e., use SSE+ instructions) and parallelize the matrix/matrix product in a way that is impossible if we perform separate calls for each matrix/vector product. In fact, the speed-ups can be quite impressive.

*

* *

So this concludes (for now) the series of entries on interpolation. It is by no mean an exhaustive survey of the field, but it should have given you the gist of interpolation and how polynomial interpolation in particular, is done.