A while ago, Martin Capoušek drew my attention on Kahan’s summation algorithm and ask how it compared to the other methods I presented then.

Well, let’s find out.

Explorations in better, faster, stronger code.

A while ago, Martin Capoušek drew my attention on Kahan’s summation algorithm and ask how it compared to the other methods I presented then.

Well, let’s find out.

While flipping the pages of a “Win this interview” book—just being curious, not looking for a new job—I saw this seemingly simple question: how would you compute the sum of a series of floats contained in a array? The book proceeded with the simple, obvious answer. But… is it that obvious?

This week, let’s go back to (low level) programming, with IEEE floats. To unit test a function of float, it does not sound unreasonable to just *enumerate* them all. But how do we do that efficiently? Clearly `f++` will not get us there.

Nor will the machine-epsilon (the `std::numeric_limits::epsilon()`) because this value works fine around 1, but as the value diverges from 1, the epsilon basically becomes useless. We would either need a magnitude-dependent epsilon (which the standard library does not provide) or a way of enumerating explicitly the floats in increasing or decreasing order (something also not provided by the standard library). Well, let’s see how we can do that

The `float` and `double` floating-point data types have been present for a long time in the C (and C++) standard. While neither the C nor C++ standards do not enforce it, virtually all implementations comply to the IEEE 754—or try very hard to. In fact, I do not know as of today of an implementation that uses something very different. But the IEEE 754-type floats are aging. GPU started to add extensions such as short floats for evident reasons. Should we start considering adding new types on both ends of the spectrum?

The next step up, the quadruple precision float, is already part of the standard, but, as far as I know, not implemented anywhere. Intel x86 does have something in between for its internal float format on 80 bits, the so-called extended precision, but it’s not really standard as it is not sanctioned by the IEEE standards, and, generally speaking, and surprisingly enough, not really supported well by the instruction set. It’s sometimes supported by the `long double` C type. But, anyway, what’s in a floating point number?

Phimuemue, in a recent post (at the time of writing, anyway) present his variation on sorting floating point values using radix sort. His implementation wasn’t dealing with the pesky sign bit so I offered a slight modification to his algorithm as a comment on his blog. But for some reason, he did not allow to post it.

So I’ll present my solution here.