In a couple of different occasions I discussed the topic of interpolation, without really going into the details. Lately, I had to interpolate data and that got me interested (again) in interpolation; and I think I should share some of the things I learned.
In this first post (of a series), let us begin by the simplest interpolation of all (after constant interpolation): linear interpolation.
OK. So let us say we have a series of points, coming from an unknown function . (Of course, the following will generalize to any number of dimensions, but for the sake of simplicity, let us only consider time-series like data in two dimensions.) The goal of interpolation is to find a function that yields a verisimilar approximation of (for various definitions of verisimilar). However, unlike fitting (a.k.a regression), the function is not allowed to merely pass though the points while minimizing some average error; it must pass by the known points. That is, if is a known point.
The simplest possible such function is to use the last known value as an approximation. That is, if . Of course, it doesn't do that well, because it does not yield a smooth, pleasing curve, but a rather blockish approximation. We can refine this to "nearest neighbor" where returns the such that , that is, the -value from the closest in the series.
Next in complexity comes linear interpolation where we draw (virtually) a line between two consecutive points, and it is this line that will serve as an approximation to the underlying (and unknown) function that yielded the data points.
The schoolbook expression for a straight line is given by
where is the base and the slope. In the case of linear interpolation, the base and slope are determined by the points and , with . We can rewrite the above as
where is the slope.
We can rearrange the equation slightly to get
Now, the equation becomes the base, , plus a smooth (but linear) variation between and controlled by : a mix of and . Indeed: let’s get that result differently. Suppose we rewrite the linear interpolation as a blending of and :
where . If put , we can expand the above to:
So either way we look at it, we arrive at the same equation.
Of course there are better ways of drawing a line between two points. Interpolation may be used to fill a continuum or only to get a few more points. The above formula is quite efficient enough to get just a few more points; maybe not so much to get a large number of points. In fact, I’m not sure what would be most (computationally efficient) way of generating a very large number of points between two known points. Likely precompute the slope and iterate varying only using a finite-difference like method.
Linear interpolation is simple, maybe too simple; as is has quite a limited expressiveness. Functions that we hypothesize to be smooth are rendered as a piecewise linear function, and for many applications, it creates objectionable artifacts (images linearly interpolated are ugly). Of course, a smooth interpolation that not only uses and but also their neighbors ( , , etc., ) could capture more information about the (unknown) function and yield a smoother approximation.
Well, maybe we could fit something more flexible than a straight line through the points… maybe some polynomial?