A use for log1p

The ANSI C math library includes a few curious functions such as log1p, which is duly copied (or wrapped) in other languages like Python. The programmer without a background in numerics may wonder why such a function is necessary or useful. Here is a real world example in calculating the likelihood of a random collision in a Birthday Attack.

Wikipedia gives the formula n(p;H) ≈ sqrt(2 H ln(1/(1-p)). If you directly transcribe that expression into Python:

def n_naive(p,H):
    return max(2,sqrt(2 * H * log(1/(1-p))))
and try to reproduce the leftmost columns of that table:
def dotable(f):
    bb = (16, 32, 64, 128, 256, 384, 512)
    pp = (1e-18, 1e-15, 1e-12, 1e-9, 1e-6, .001)
    pt = "1e-18 1e-15 1e-12 1e-9 1e-6 .1%".split()

    print "    " + " ".join("%7s" % p for p in pt)
    for b in bb:
        print "%3d" % b,
        for p in pp:
            print "%7.2g" % f(p,2**b),
        print

dotable(n_naive)
you'll be disappointed with the results: the left-hand column will be all "2"s, and the second column will have several values that don't even share one significant figure with the true value.

The reason is that the subexpression log(1/(1-p)) is computing a value very close to 1, then taking its log. For very small values of p, 1-p isn't even different from p, leading to the computation log(1) == 0!

This is a perfect opportunity to use log1p. First, use the identity ln 1/x = -ln x to obtain the expression -log(1-p) but -log(1-p) = -log1p(-p). This leads to the second implementation of the function:

def n_clever(p,H):
    return max(2,sqrt(2 * H * -log1p(-p)))

dotable(n_clever)
Now, the values shown will match Wikipedia's table.

Files currently attached to this page:

nrc.py591 bytes



Entry first conceived on 29 May 2012, 13:03 UTC, last modified on 8 October 2012, 22:56 UTC
Website Copyright © 2004-2017 Jeff Epler