The nice thing about the trigonometric circle is that it is, quite so indeed, a circle. But it’s also a disadvantage when you consider the implementations of functions manipulating angles, especially $\mathrm{mod}~2\pi$.

This is particularly a problem when you’re not so much interested by a specific angle (which is at all time always well-defined) than by a series of angles varying in time. If you’re certain that your angle is always within $\pm\pi$ or within $[0,2\pi]$, you do not expect problems. What if, on the contrary the angle wraps around?

So here how the problem arose for me (at least in a simplified version). I have a time series composed of angles $\{\theta_t\}_{t=1}^n$, and, given (a possibly large portion of) the history, I must predict $\theta_{t+1}$. Normally, that shouldn’t be too complicated. Except that I do not generate the series, and all angles I have are $\mathrm{mod}~2\pi$.

If I was the one generating the angles, I could have simply stored them using differences, that is, starting at an initial $\theta_1$, I would emit only the differences, the $\Delta\theta_t$, and then I am free to store internally angles of $42\pi$ if I please.

However, since the angles I have are stored $\mathrm{mod}~2\pi$, it is not always possible to figure out how to repair the series so that angles are not stored $\mathrm{mod}~2\pi$. If the differences around a zero-crossing are small, then you go from close to zero to close to $2\pi$ and you can correct and remove the offset of $2\pi$. But if the variation is large enough to hide a zero-crossing (or a $2\pi$-crossing) you have ambiguity.

The solution I found was to use quadrature phase encoding, where angles are not stored directly as, say, $\theta$, but as a pair

$Enc(\theta)=(\sin\theta,\cos\theta)$.

This encoding seemingly doubles the quantity of information you have to learn (if you’re doing machine learning) but in fact, it’s only a re-representation of the input that removes the wrap-arounds (without special cases) and gives a representation that is varying smoothly with any angle $\theta$, regardless of whether or not you’re crossing zero (or $2\pi$).

Let us have a look at what this representation does on angle time-series. For the need of the demonstration, I used a random walk where $\theta_{t+1}=\theta_t+\mathcal{U}(-0.2,+0.2)$. In all graphs the colors are: black, the non-wrapped series, red, the series $\mathrm{mod}~2\pi$, dashed purple, the cosine component, and dashed blue the sine component. So, let us look at a first graph:

There’s a zero crossing somewhere around 45-50, and the series jumps up close to $2\pi$, which leaves the predictor to understand what happened there. No need to say, in a compression setting, that jumps means a lots of bits to compensate for the use prediction error, as the prediction would probably be around -0.1 or something and the observed value close to $2\pi$.

The second example makes the point entirely I think. The (true) series remains close to zero and there are plenty of crossing. From the point of view of a learning algorithm, the sequence would seem rather hard to understand—because we, in general, do not suppose the learning algorithm knows something about the data it learns from.

*
* *

If we look at what $(\sin\theta,\cos\theta)$ represents geometrically, we see that it just generates the points on the unit circle. If we cross zero or $2\pi$, $(\cos\theta,\sin\theta)$ just describes another point on the unit circle and the variation is smooth (i.e., a small change in $\theta$ gives a small change in $(\cos\theta,\sin\theta)$.

If we had a pair angles, say $\phi$ and $\theta$, it would now describe a sphere, and the encoding would become $(\cos\phi\sin\theta,\sin\phi\sin\theta,\cos\theta)$, which is the spherical-cartesian coordinates transform. If we had more angles (in a higher-dimensional case), we could find similar equations to accommodate the encoding of angles. In general, we will have one more encoded component than the number of angles (indeed), so the cost goes down as the number of dimensions goes up.

*
* *

I do not know yet if this representation will help my learning algorithm or not, but I suspect that removing the (apparently random) discontinuities will certainly not hurt. One could also address the more general problem of representing orientations relative to a given vector. What makes things complicated is that we are not very good, in general, at picturing high-dimensional objects—it seems every time I start thinking about these they end up curiously 3D volume- and plane-like.