Preparing lecture notes (or videos) sometimes brings you to revisit things you’ve known for a while, but haven’t really taken time to formalize properly. One of those things is fixed-point arithmetic.
Fixed-point arithmetic joins the advantage of representing fractional values while maintaining a great simplicity in the operations. Contrary to floating point, arithmetic on fixed-points can be done using entirely integer operations, which explains their popularity on DSPs of all sorts.
Basically, a fixed-point number is just a fixed-width number of bits, with an unmoving, virtual point. For example, in a 16-bits register, four bits could be devoted to the fractional part, and 12 to the integer part, making a Q12.4 fixed-point (the Qn.m notation says that we have n integer bits, m fractional bits). In a fixed-point, the fractional bits behave as you would expect: the first after the point is worth 1/2, the next one 1/4, etc. Furthermore, the integers can be two’s-complement.
What makes it especially nice is the arithmetic that comes with fixed-point numbers:
- Addition/subtraction. It behaves exactly as two’s-complement integer addition and subtraction. Therefore, adding/subtracting two fixed-point can be done directly by adding/subtracting the registers as signed integers.
- multiplication. Multiplying together two Qn.m fixed-points results in a Q2n.2m fixed-point, that we must restore to Q2n.m at least. (as always, if it overflows, that’s your problem). To multiply two Qn.m fixed-points, we use integer multiplication and shift back (right) m bits to restore the results to Qx.m. The nice thing is that it results in the correct truncation.
- division. Division is the tricky one (but isn’t much harder). Let:
and
be two Qn.m values. Then
.
It’s as if we lost the fractional bits. That’s a problem because the quotient isn’t an integer; and it should be represented in
ths. To express the result correctly in
ths, we multiply the dividend (the number being divided) by
, which is only a shift left.
So basically, fixed-point numbers are numbers expressed in ths.
*
* *
So let’s implement that. First, we need a cute little template to get, given an integer type, the type containing the double number of bits (for multiplication and division):
// enough.hpp #ifndef __MODULE_ENOUGH__ #define __MODULE_ENOUGH__ namespace enough { template <typename T> struct __twice; // incomplete type template <> struct __twice<int8_t> { using type = int16_t; }; template <> struct __twice<int16_t> { using type = int32_t; }; template <> struct __twice<int32_t> { using type = int64_t; }; template <> struct __twice<int64_t> { using type = int64_t; }; // fails silently template <typename T> using twice=typename __twice<T>::type; } // namespace enough #endif // __MODULE_ENOUGH__
We could use C++2a concepts to make sure that the fixed-point template has consistent arguments (I’m just beginning to look into C++20), and make sure we overload the basic operators:
// fixed.hpp #ifndef __MODULE_FIXED__ #define __MODULE_FIXED__ #include <enough.hpp> template <typename T, int scale=4> requires ((std::is_signed<T>::value==true) // c++2a && (scale<sizeof(T)*8)) class fixed { private: T number; fixed(int, T n) :number(n) {} public: // avec un autre fixed fixed operator+(const fixed & f) const { return fixed(0,number+f.number); } fixed operator-(const fixed & f) const { return fixed(0,number-f.number); } fixed operator*(const fixed & f) const { return fixed(0,(enough::twice<T>{number}*f.number) >> scale); } fixed operator/(const fixed & f) const { return fixed(0,(enough::twice<T>{number}<<scale)/f.number); } // avec float fixed operator+(float f) const { return *this+fixed(f); } fixed operator-(float f) const { return *this-fixed(f); } fixed operator*(float f) const { return *this*fixed(f); } fixed operator/(float f) const { return *this/fixed(f); } fixed operator=(const fixed & f) { return number=f.number; } bool operator==(const fixed & f) { return number==f.number; } operator float() const { return number/(float)(1<<scale); } fixed(): number(0) {} fixed(const fixed<T,scale> & other) : number(other.number) {}; fixed(float f) : number(f * (1<<scale)) {} }; #endif // __MODULE_FIXED__
We just use it:
#include <iostream> #include <fixed.hpp> int main() { fixed<int16_t> a(-4),b(3); std::cout << a+b << std::endl << a*b << std::endl << a/b << std::endl << a-b << std::endl ; return 0; }
*
* *
The complexity is visibly much, much less than floating point numbers. Addition/subtraction is just signed integer add/sub; multiplication and division ask for an integer promotion (the “cast”) and a shift. If you choose the number of bits after the point wisely, you may even get away with no-shifts and only memory operations. No wonder it’s used in low-complexity, low-power DSP chips.