## Fast Fibonacci

The Fibonacci numbers are very interesting for many reasons, from seed heads to universal coding (such as Taboo Codes). Just figuring how to compute them efficiently is plenty of fun.

The classical formulation of the Fibonacci numbers is given by the following recurrence:

$F_n=\begin{cases} 1 &\mathrm{if~}n\leqslant{2}\\ F_{n-1}+F_{n-2} &\mathrm{otherwise}\end{cases}$

This suggests the direct (and naïve) recursive implementation:

uint64_t fibo_rec(uint64_t n)
{
if (n<=2)
return 1;
else
return fibo_rec(n-1)+fibo_rec(n-2);
}


This is lovely, but this generates $O(F_n)$ recursive calls to compute $F_n$! Run-times grows exponentially in $n$ (since $F_n \approx \phi^n$, where $\phi=\frac{1}{2}(1+\sqrt{5})$ is the golden ratio).

We can avoid recursive calls by optimizing the tail recursion explicitly:

uint64_t fibo_iter(uint64_t n)
{
uint64_t n_1=1,n_2=0;

for (uint64_t i=0;i<n;i++)
{
uint64_t t=n_1+n_2;
n_2=n_1;
n_1=t;
}

return n_1;
}


This now computes $F_n$ in $O(n)$, which is exponentially better than the naïve recursive version. The iterative version uses something of shift register to slide the successive Fibonacci numbers into n_2 and n_1, using a temporary variable to hold the sum. Some may object to the temporary variable, so we can rewrite the above as:

uint64_t fibo_iter_var(uint64_t n)
{
uint64_t n_1=1,n_2=0;

for (uint64_t i=0;i<n;i++)
{
n_1=n_1+n_2;
n_2=n_1-n_2;
}

return n_1;
}


The part n_2=n_1-n_2 simplifies to n_1 (as $n_2\gets n_1-n_2$ is $n_2\gets(n_1+n_2)-n_2$, which is $n_2\gets n_1$). So we trade a temporary variable for an extra operation. It may, or mayn’t be worth it.

*
* *

Can we do much better? A priori, it seems that linear time is the best we can do. Fortunately, some astute fellow noticed that the matrix

$\left[\begin{array}{cc} 0 & 1\\ 1 & 1 \end{array}\right]$

has the property of “shifting” and adding components of a vector. For example,

$\left[\begin{array}{cc} 0 & 1\\ 1 & 1 \end{array}\right] \left[\begin{array}{c} a\\ b \end{array}\right] = \left[\begin{array}{c} b\\ a+b\end{array}\right]$

which should look familiar. Indeed, if we replace $a$ by $F_{n-2}$ and $b$ by $F_{n-1}$, we get

$\left[\begin{array}{cc} 0 & 1\\ 1 & 1 \end{array}\right] \left[\begin{array}{c} F_{n-2}\\ F_{n-1} \end{array}\right] = \left[\begin{array}{c} F_{n-1}\\ F_{n-1}+F_{n-2} \end{array}\right] = \left[\begin{array}{c} F_{n-1}\\ F_n\end{array}\right]$

That’s swell! But this supposes that we already know $F_{n-1}$ and $F_{n-2}$. This suggests a (recursive) decomposition:

$\left[\begin{array}{cc} 0 & 1\\ 1 & 1 \end{array}\right] \left[\begin{array}{c} F_{n-2}\\ F_{n-1}\end{array}\right]=\left[\begin{array}{cc} 0 & 1\\ 1 & 1 \end{array}\right] \left[\begin{array}{cc} 0 & 1\\ 1 & 1 \end{array}\right]\left[\begin{array}{c} F_{n-3}\\ F_{n-2}\end{array}\right]$

and if we go all the way down to the base cases $F_1=1$ and $F_2=1$, we have

$\left[\begin{array}{cc} 0 & 1\\ 1 & 1\end{array}\right]^n \left[\begin{array}{c}1\\ 1 \end{array}\right]=\left[\begin{array}{c} F_{n-1}\\ F_{n}\end{array}\right]$

But computing a power of a matrix seems expensive. At least a linear amount of matrix multiplies. Fortunately, no. We can compute $A^n$ in $O(\lg n)$ (here, we take the matrix product as constant, since the dimension of the matrix is constant, $2\times{2}$). How? By noticing that, for example $x^5=(x^2)^2x$ or $x^7=((x^2)x)^2x$. For integers, computing $x^n$ would look like:

int expo(int x, int n)
{
int t=1;
int y=x;
while (n)
{
if (n & 1) t*=y;
y*=y;
n>>=1;
}
return t;
}


We can do the same thing using matrices. In the above, we have t, the running product, and y a “squaring” product. If we write

$\left[\begin{array}{cc} a_t & b_t\\ c_t & d_t \end{array}\right]$ and $\left[\begin{array}{cc}a_y & b_y\\ c_y & d_y\end{array}\right]$

with the first matrix initialized to the identity and the second to the Fibonacci matrix. Since we know the matrix is symmetric, we can kill off $c_t$ and $c_y$, since they’re always identical to $b_t$ and $b_y$, respectively. The fun begins:

$\left[\begin{array}{cc} a_t & b_t\\ b_t & d_t\end{array}\right]\left[\begin{array}{cc}a_y & b_y\\ b_y & d_y\end{array}\right]=\left[\begin{array}{cc}a_ta_y+b_t+b_y & a_tb_y+b_td_y\\ --- & b_tb_y+d_td_y\end{array}\right]$

and

$\left[\begin{array}{cc} a_y & b_y\\ b_y & d_y\end{array}\right]^2=\left[\begin{array}{cc} a_y^2+b_y^2 & b_y(a_y+d_y)\\ --- & b_y^2+d_y^2\end{array}\right]$

where the dash indicate symmetry.

Plugging the above results in the exponentiation function, we get


uint64_t fibo_expo(uint64_t n)
{
uint64_t
a_t=1, b_t=0, d_t=1, // "1"
a_y=0, b_y=1, d_y=1; // "y"

while (n)
{
if (n & 1)
{
// t*=y;
uint64_t a = a_t*a_y+b_t*b_y;
uint64_t b = a_t*b_y+b_t*d_y;
uint64_t d = b_t*b_y+d_t*d_y;

a_t=a;
b_t=b;
d_t=d;
}

//y*=y;
uint64_t a = a_y*a_y+b_y*b_y;
uint64_t b = b_y*(a_y+d_y);
uint64_t d = b_y*b_y+d_y*d_y;

a_y=a;
b_y=b;
d_y=d;

n>>=1;
}
return b_t;
}


If we factor out common sub-expressions and assign in such an order that the temporary aren’t needed:

uint64_t fibo_expo_var(uint64_t n)
{
uint64_t
a_t=1, b_t=0, d_t=1, // "1"
a_y=0, b_y=1, d_y=1; // "y"

while (n)
{
if (n & 1)
{
// t*=y;
uint64_t bx = b_t * b_y;
b_t = a_t*b_y+b_t*d_y;
a_t = a_t*a_y+bx;
d_t = bx+d_t*d_y;
}

//y*=y;
uint64_t b2 = b_y*b_y;
b_y *= (a_y+d_y);
a_y = a_y*a_y+b2;
d_y = b2+d_y*d_y;

n>>=1;
}
return b_t;
}


*
* *

That’s good, we have an $O(\lg n)$ algorithm to compute the Fibonacci numbers. But we can still do slightly better. How? Well, we can notice that

$\left[\begin{array}{cc}a & b\\ b & d\end{array}\right]=\left[\begin{array}{cc}a & b\\ b & a+b\end{array}\right]$

so we can kill off $d$. This will let us have equations using only the variables $a$ and $b$. Let us rework the product and the squaring terms using only $a$ and $b$:

$\left[\begin{array}{cc}a_t & b_t\\ b_t & a_t+b_t\end{array}\right]\left[\begin{array}{cc}a_y & b_y\\ b_y & a_y+b_y\end{array}\right]=\left[\begin{array}{cc}a_ta_y+b_t+b_y & a_tb_y+b_t(ay+b_y)\\--- & --- \end{array}\right]$

and

$\left[\begin{array}{cc} a_y & b_y\\ b_y & d_y\end{array}\right]^2=\left[\begin{array}{cc} a_y^2+b_y^2 & a_yb_y+b_y(a_y+b_y)\\ --- & ---\end{array}\right]$

where the dashes represents the values for which we don’t even care anymore. Using these results yields:

uint64_t fibo_expo_var_2(uint64_t n)
{
uint64_t
a_t=1, b_t=0, // "1"
a_y=0, b_y=1; // "y"

while (n)
{
if (n & 1)
{
// t*=y;
uint64_t bx=b_t*b_y;
b_t = a_t*b_y+b_t*a_y+bx;
a_t = a_t*a_y+bx;
}

//y*=y;
uint64_t b2 = b_y*b_y;
b_y = 2*a_y*b_y+b2;
a_y = a_y*a_y+b2;

n>>=1;
}
return b_t;
}


*
* *

So how do they compare, speed-wise? The naïve recursive one does dreadfully. It takes over a minute to yield the sequence up to $F_{50}$. The others takes microseconds.

For 10000000 (10 millions) iterations (using the same random series of queries, with random $n$ varying from 1 to 70), we have

 iter 3.29529s iter_var 3.30153s expo 2.28118s expo_var 2.2531s expo_var_2 2.22531s

The last three versions are already much faster, but the speed does not change all that much between the fast exponentiation versions. If the gain doesn’t seem all that impressive compared to the iterative versions, let us remind ourselves that we are only computing relatively small Fibonacci numbers. 64-bits unsigned integers allow only to store up to $F_{92}$ (because $\log_\phi 2^{64}\approx 92$ —remember, $F_n\approx\phi^n$, so taking the log this way indeed solves for the $n$ that makes $\phi^n=2^{64}$) but if we were to use big integers (abstracted through a class), we could compute $F_{129713151}$ immensely faster than using the iterative method. Or the recursive one.