Preparing lecture notes (or videos) sometimes brings you to revisit things you’ve known for a while, but haven’t really taken time to formalize properly. One of those things is fixed-point arithmetic.

Fixed-point arithmetic joins the advantage of representing fractional values while maintaining a great simplicity in the operations. Contrary to floating point, arithmetic on fixed-points can be done using entirely integer operations, which explains their popularity on DSPs of all sorts.

Basically, a fixed-point number is just a fixed-width number of bits, with an unmoving, virtual point. For example, in a 16-bits register, four bits could be devoted to the fractional part, and 12 to the integer part, making a Q12.4 fixed-point (the Qn.m notation says that we have n integer bits, m fractional bits). In a fixed-point, the fractional bits behave as you would expect: the first after the point is worth 1/2, the next one 1/4, etc. Furthermore, the integers can be two’s-complement.

What makes it especially nice is the arithmetic that comes with fixed-point numbers:

**Addition/subtraction**. It behaves exactly as two’s-complement integer addition and subtraction. Therefore, adding/subtracting two fixed-point can be done directly by adding/subtracting the registers as signed integers.**multiplication**. Multiplying together two Qn.m fixed-points results in a Q2n.2m fixed-point, that we must restore to Q2n.m at least. (as always, if it overflows, that’s your problem). To multiply two Qn.m fixed-points, we use integer multiplication and shift back (right) m bits to restore the results to Qx.m. The nice thing is that it results in the correct truncation.**division**. Division is the tricky one (but isn’t much harder). Let:and

be two Qn.m values. Then

.

It’s as if we lost the fractional bits. That’s a problem because the quotient isn’t an integer; and it should be represented in ths. To express the result correctly in ths, we multiply the dividend (the number being divided) by , which is only a shift left.

So basically, fixed-point numbers are numbers expressed in ths.

*

* *

So let’s implement that. First, we need a cute little template to get, given an integer type, the type containing the double number of bits (for multiplication and division):

// enough.hpp #ifndef __MODULE_ENOUGH__ #define __MODULE_ENOUGH__ namespace enough { template <typename T> struct __twice; // incomplete type template <> struct __twice<int8_t> { using type = int16_t; }; template <> struct __twice<int16_t> { using type = int32_t; }; template <> struct __twice<int32_t> { using type = int64_t; }; template <> struct __twice<int64_t> { using type = int64_t; }; // fails silently template <typename T> using twice=typename __twice<T>::type; } // namespace enough #endif // __MODULE_ENOUGH__

We could use C++2a concepts to make sure that the fixed-point template has consistent arguments (I’m just beginning to look into C++20), and make sure we overload the basic operators:

// fixed.hpp #ifndef __MODULE_FIXED__ #define __MODULE_FIXED__ #include <enough.hpp> template <typename T, int scale=4> requires ((std::is_signed<T>::value==true) // c++2a && (scale<sizeof(T)*8)) class fixed { private: T number; fixed(int, T n) :number(n) {} public: // avec un autre fixed fixed operator+(const fixed & f) const { return fixed(0,number+f.number); } fixed operator-(const fixed & f) const { return fixed(0,number-f.number); } fixed operator*(const fixed & f) const { return fixed(0,(enough::twice<T>{number}*f.number) >> scale); } fixed operator/(const fixed & f) const { return fixed(0,(enough::twice<T>{number}<<scale)/f.number); } // avec float fixed operator+(float f) const { return *this+fixed(f); } fixed operator-(float f) const { return *this-fixed(f); } fixed operator*(float f) const { return *this*fixed(f); } fixed operator/(float f) const { return *this/fixed(f); } fixed operator=(const fixed & f) { return number=f.number; } bool operator==(const fixed & f) { return number==f.number; } operator float() const { return number/(float)(1<<scale); } fixed(): number(0) {} fixed(const fixed<T,scale> & other) : number(other.number) {}; fixed(float f) : number(f * (1<<scale)) {} }; #endif // __MODULE_FIXED__

We just use it:

#include <iostream> #include <fixed.hpp> int main() { fixed<int16_t> a(-4),b(3); std::cout << a+b << std::endl << a*b << std::endl << a/b << std::endl << a-b << std::endl ; return 0; }

*

* *

The complexity is visibly much, much less than floating point numbers. Addition/subtraction is just signed integer add/sub; multiplication and division ask for an integer promotion (the “cast”) and a shift. If you choose the number of bits after the point wisely, you may even get away with no-shifts and only memory operations. No wonder it’s used in low-complexity, low-power DSP chips.