So in previous installments, we looked at how to use the euclidean algorithm to devise a Las Vegas-type algorithm for primality testing. We also found that, in general, simply testing factors one after the other is much more efficient (but that doesn’t mean that there are not better algorithms to test for primality!).

We also considered only relatively small primorials (the products of the first prime numbers) since they rapidly exceeded . But just how fast do primorials grow?

The first step is to define the primorials, (some use , which I find most horrible). is defined as

with being the th prime number.

The second step is to compute primorials, gather data. You either use a library such as GMP, or use a system like *Mathematica*. I opted for Sage, a Python-like language that offers arbitrary-precision integers and sports a very well adorned library of mathematical primitives. The Sage program to generate the list of primorials goes like this:

# generator for primes x=(p for p in Primes()) # actually generate a list of primes l=[ x.next() for i in range(150) ] # RDF = approximation of Read to DoubleFloat (or something like that) p=[ RDF(log(prod(l[:i]))) for i in range(1,len(l)+1) ] f = file('output.txt','w') f.write(str(p)) f.close()

As you can see, Sage is very Python-like (it has its differences but if you’re OK with Python you should be OK with the code above): first, we create a generator object that iterates over the list of all primes; second, we instantiate a short list (150 primes, we explain why a bit later on), then the log (base-e) of the primorials.

The third step is to plot the data and analyze it. We get:

We have the blue curve, the , essentially equivalent to the number of bits necessary to store . The green line is (since, according to theory, we should have that , which is more or less true, at a constant) and the red line is which locally fits pretty well.

*

* *

So what does the graph tell us? That the number of bits necessary to use (the product of the first primes) is proportional to when in in the range 40 to 100 (it’s a local fit, but reasonable since we would not have an infinite number of primes anyway). Remember, we wanted to use to assess, to some degree of certainty, whether is prime or not. We also have shown before that adding primes only augmented the Las Vegas-type algorithm accuracy slowly, and that therefore we would need a very large number of primes, which, as we just shown, makes far too big for machine-size integers.

Using a large integer library is not terribly inconvenient; but we already know that testing primes one by one is, paradoxically, *much* faster. To test with certainty a number , we need only to test primes up to . Indeed, if the number is composite, say for integer and , we either have and therefore , or we have and therefore . Either way, the first factor to be found is at most ; and the expected number of tests is roughly proportional to the th harmonic number—I’ll show this in a few weeks. On the other hand, is a rather large number, even for not-so-big .

So, I’m now pretty convinced isn’t that helpful for testing primality.

Empirical testing is useful because it may uncover unexpected interactions that affect performance. Benchmarks may be used to compare before/after potential improvements to an algorithm after program optimization.