The Fibonacci numbers are very interesting for many reasons, from seed heads to universal coding (such as Taboo Codes). Just figuring how to compute them efficiently is plenty of fun.

The classical formulation of the Fibonacci numbers is given by the following recurrence:

This suggests the direct (and naïve) recursive implementation:

uint64_t fibo_rec(uint64_t n) { if (n<=2) return 1; else return fibo_rec(n-1)+fibo_rec(n-2); }

This is lovely, but this generates recursive calls to compute ! Run-times grows exponentially in (since , where is the golden ratio).

We can avoid recursive calls by optimizing the tail recursion explicitly:

uint64_t fibo_iter(uint64_t n) { uint64_t n_1=1,n_2=0; for (uint64_t i=0;i<n;i++) { uint64_t t=n_1+n_2; n_2=n_1; n_1=t; } return n_1; }

This now computes in , which is exponentially better than the naïve recursive version. The iterative version uses something of shift register to slide the successive Fibonacci numbers into `n_2` and `n_1`, using a temporary variable to hold the sum. Some may object to the temporary variable, so we can rewrite the above as:

uint64_t fibo_iter_var(uint64_t n) { uint64_t n_1=1,n_2=0; for (uint64_t i=0;i<n;i++) { n_1=n_1+n_2; n_2=n_1-n_2; } return n_1; }

The part `n_2=n_1-n_2` simplifies to `n_1` (as is , which is ). So we trade a temporary variable for an extra operation. It may, or mayn’t be worth it.

*

* *

Can we do much better? A priori, it seems that linear time is the best we can do. Fortunately, some astute fellow noticed that the matrix

has the property of “shifting” and adding components of a vector. For example,

which should look familiar. Indeed, if we replace by and by , we get

That’s swell! But this supposes that we already know and . This suggests a (recursive) decomposition:

and if we go all the way down to the base cases and , we have

But computing a power of a matrix seems expensive. At least a linear amount of matrix multiplies. Fortunately, no. We can compute in (here, we take the matrix product as constant, since the dimension of the matrix is constant, ). How? By noticing that, for example or . For integers, computing would look like:

int expo(int x, int n) { int t=1; int y=x; while (n) { if (n & 1) t*=y; y*=y; n>>=1; } return t; }

We can do the same thing using matrices. In the above, we have `t`, the running product, and `y` a “squaring” product. If we write

and

with the first matrix initialized to the identity and the second to the Fibonacci matrix. Since we know the matrix is symmetric, we can kill off and , since they’re always identical to and , respectively. The fun begins:

and

where the dash indicate symmetry.

Plugging the above results in the exponentiation function, we get

uint64_t fibo_expo(uint64_t n) { uint64_t a_t=1, b_t=0, d_t=1, // "1" a_y=0, b_y=1, d_y=1; // "y" while (n) { if (n & 1) { // t*=y; uint64_t a = a_t*a_y+b_t*b_y; uint64_t b = a_t*b_y+b_t*d_y; uint64_t d = b_t*b_y+d_t*d_y; a_t=a; b_t=b; d_t=d; } //y*=y; uint64_t a = a_y*a_y+b_y*b_y; uint64_t b = b_y*(a_y+d_y); uint64_t d = b_y*b_y+d_y*d_y; a_y=a; b_y=b; d_y=d; n>>=1; } return b_t; }

If we factor out common sub-expressions and assign in such an order that the temporary aren’t needed:

uint64_t fibo_expo_var(uint64_t n) { uint64_t a_t=1, b_t=0, d_t=1, // "1" a_y=0, b_y=1, d_y=1; // "y" while (n) { if (n & 1) { // t*=y; uint64_t bx = b_t * b_y; b_t = a_t*b_y+b_t*d_y; a_t = a_t*a_y+bx; d_t = bx+d_t*d_y; } //y*=y; uint64_t b2 = b_y*b_y; b_y *= (a_y+d_y); a_y = a_y*a_y+b2; d_y = b2+d_y*d_y; n>>=1; } return b_t; }

*

* *

That’s good, we have an algorithm to compute the Fibonacci numbers. But we can still do slightly better. How? Well, we can notice that

so we can kill off . This will let us have equations using *only* the variables and . Let us rework the product and the squaring terms using only and :

and

where the dashes represents the values for which *we don’t even care anymore*. Using these results yields:

uint64_t fibo_expo_var_2(uint64_t n) { uint64_t a_t=1, b_t=0, // "1" a_y=0, b_y=1; // "y" while (n) { if (n & 1) { // t*=y; uint64_t bx=b_t*b_y; b_t = a_t*b_y+b_t*a_y+bx; a_t = a_t*a_y+bx; } //y*=y; uint64_t b2 = b_y*b_y; b_y = 2*a_y*b_y+b2; a_y = a_y*a_y+b2; n>>=1; } return b_t; }

*

* *

So how do they compare, speed-wise? The naïve recursive one does dreadfully. It takes over a minute to yield the sequence up to . The others takes microseconds.

For 10000000 (10 millions) iterations (using the same random series of queries, with random varying from 1 to 70), we have

iter | 3.29529s |

iter_var | 3.30153s |

expo | 2.28118s |

expo_var | 2.2531s |

expo_var_2 | 2.22531s |

The last three versions are already much faster, but the speed does not change all that much between the fast exponentiation versions. If the gain doesn’t seem all that impressive compared to the iterative versions, let us remind ourselves that we are only computing relatively small Fibonacci numbers. 64-bits unsigned integers allow only to store up to (because —remember, , so taking the log this way indeed solves for the that makes ) but if we were to use big integers (abstracted through a class), we could compute immensely faster than using the iterative method. Or the recursive one.