Calculating unreasonably large Fibonacci numbers quickly

I love big numbers and recursion. This time, we'll start by showing that (naive) recursion isn't the right way to solve a problem, then follow with some better ways.

The Fibonacci sequence is an example of a recursively defined sequence of numbers. The first two numbers in the sequence are defined as 0 and 1, and each following number is defined as the sum of the two previous numbers.

Given this definition, it is very easy to write an extremely inefficient recursive function to calculate the n'th number:

def fib(n):
    if n == 0 or n == 1: return n
    return fib(n-1) + fib(n-2)

In most standard languages (the above is intended to be Python) this is inefficient because the program is approached as a sequence of operations. Each time "fib(13) is encountered, the body of the fib function is executed. This in turn evaluates "fib(12)" and "fib(11)". "fib(12)" evaluates "fib(11)" again. In the end, the total number of times a "fib(n)" is evaluated grows very quickly compared to "n". In fact, I think it is one of those cases of increasing literally exponentially. This is not a good method to find the Fibonacci sequence or individual Fibonacci numbers.

Another way that works better is a loop that calculates each number in the sequence from the previous two, until the desired one is reached:

def fib(n):
    a, b = 0, 1
    for i in range(n):
        a, b = a+b, a
    return a

Now, the n'th number is calculated after only n additions. A big improvement! This method is quite good for finding the entire Fibonacci sequence up to some 'n', or for finding particular values for 'n' in the hundreds or thousands.

Past this, if you don't need an exact number, you can use floating-point arithmetic, using a closed-form expression involving (this sounds funny if you don't know it's math words, don't laugh) "powers of the golden ratio".

import decimal

def fib(n):
    sqrt_5 = decimal.Decimal(5).sqrt()
    phi = (1 + sqrt_5) / 2
    return (phi**n - (-phi)**(-n)) / sqrt_5

.. fib(4784970) is Decimal('1.735729075920066439815208698E+999999'), give or take some of the last decimal places. After that, Python's standard decimal package throws in the towel.

You can make approximations beyond this by doing log-math. First, notice how the (-phi) ** (-n) term goes quickly to zero, leaving just phi ** n / sqrt(5). You can find the log10 of a really huge Fibonacci numbe as:

from math import log

def logfib(n, base=10):
    log_sqrt_5 = log(5, base) / 2
    log_phi = log((1 + 5**.5) / 2, base)
    return log_phi * n - log_sqrt_5

def print_log(n):
    whole, frac = divmod(n, 1)
    print(f"{10 ** frac}e{whole:.0f}")

Here, we can get 1.735729075774171e999999 as an estimate of fib(4784970), but can also easily get 1.4140750583014141e417975280 as an estimate of fib(2_000_000_000). As you'll see shortly, the first 8 digits of that second value are accurate.

The method I really want to talk about, though, is the one which can exactly calculate Fibonacci numbers—all the binary or decimal digits, and can actually find the 2 billionth number in short order and the 16 billionth one with effort (and a bigger swapfile). The program usually skips printing the actual number since converting the number from binary to decimal and printing it actually takes a fair amount of time.

$ time fib4.py 2000000000
2000000000: 417975281 digits: 141407506869875260956516…969828793289064439453125
217.43user 14.56system 0:40.17elapsed 577%CPU…
$ time ./fib4.py 16000000000
16000000000: 3343802244 digits: 446863351203393654391107…465863000407211560546875
1931.98user 205.95system 8:22.67elapsed 425%CPU…

How's this work? As usual, by standing on the shoulders of giants. First, the matrix form of the Fibonacci number, in which case the n'th number can be found by taking a particular element of the n'th power of a 2x2 matrix. Second, the "repeated squaring" algorithm, for finding the n'th power in only about log2(n) operations. Third, the gmp library for high quality multiprecision library and its "gmpy" wrapper for Python.

Besides the script below, I'm using a modified version of gmpy which allows the "+" and "*" operations to be multithreaded. This can give a good speed-up as the matrix multiplication step's individual element multiplcations can take place in parallel. As shown above, this improves the elapsed time by more than a factor of 4.

The main barrier to going further seems to be RAM. Doubling n increases the memory needs by about 1.88x, and for the 2 billionth Fibonacci number we're only at about 418 million digits. While the records are strictly incomparable, this is on a par with the largest number of digits of pi computed…in 1999.

I don't know that anyone's worked to set a record largest Fibonacci number calculated, but let this program be my entry in the contest if so.