Euclid and Primality Testing (III)

So in previous installments, we looked at how to use the euclidean algorithm to devise a Las Vegas-type algorithm for primality testing. We also found that, in general, simply testing factors one after the other is much more efficient (but that doesn’t mean that there are not better algorithms to test for primality!).

We also considered only relatively small primorials (the products of the first $n$ prime numbers) since they rapidly exceeded $2^{32}$. But just how fast do primorials grow?

The first step is to define the primorials, $P_n$ (some use $P_n\#$, which I find most horrible). $P_n$ is defined as

$P_n=\displaystyle\prod_{k=1}^n p_k$

with $p_k$ being the $k$th prime number.

The second step is to compute primorials, gather data. You either use a library such as GMP, or use a system like Mathematica. I opted for Sage, a Python-like language that offers arbitrary-precision integers and sports a very well adorned library of mathematical primitives. The Sage program to generate the list of primorials goes like this:

# generator for primes
x=(p for p in Primes())

# actually generate a list of primes
l=[ x.next() for i in range(150) ]

# RDF = approximation of Read to DoubleFloat (or something like that)
p=[ RDF(log(prod(l[:i]))) for i in range(1,len(l)+1) ]

f = file('output.txt','w')
f.write(str(p))
f.close()


As you can see, Sage is very Python-like (it has its differences but if you’re OK with Python you should be OK with the code above): first, we create a generator object that iterates over the list of all primes; second, we instantiate a short list (150 primes, we explain why a bit later on), then the log (base-e) of the primorials.

The third step is to plot the data and analyze it. We get:

We have the blue curve, the $\log_2 P_n$, essentially equivalent to the number of bits necessary to store $P_n$. The green line is $n$ (since, according to theory, we should have that $\log P_n \sim n$, which is more or less true, at a constant) and the red line is $8.5n-120$ which locally fits $\log_2 P_n$ pretty well.

*
* *

So what does the graph tell us? That the number of bits necessary to use $P_n$ (the product of the $n$ first primes) is proportional to $8.5n-120$ when $n$ in in the range 40 to 100 (it’s a local fit, but reasonable since we would not have an infinite number of primes anyway). Remember, we wanted to use $\gcd(a,P_n)$ to assess, to some degree of certainty, whether $a$ is prime or not. We also have shown before that adding primes only augmented the Las Vegas-type algorithm accuracy slowly, and that therefore we would need a very large number of primes, which, as we just shown, makes $P_n$ far too big for machine-size integers.

Using a large integer library is not terribly inconvenient; but we already know that testing primes one by one is, paradoxically, much faster. To test with certainty a number $m$, we need only to test primes up to $O(\sqrt{m})$. Indeed, if the number $m$ is composite, say $m=ab$ for integer $a$ and $b$, we either have $a>\sqrt{m}$ and therefore $b\leqslant\sqrt{m}$, or we have $b>\sqrt{m}$ and therefore $a\leqslant\sqrt{m}$. Either way, the first factor to be found is at most $O(\sqrt{m})$; and the expected number of tests is roughly proportional to the $m$th harmonic number—I’ll show this in a few weeks. On the other hand, $P_{\sqrt{m}}$ is a rather large number, even for not-so-big $m$.

So, I’m now pretty convinced $\gcd$ isn’t that helpful for testing primality.

One Response to Euclid and Primality Testing (III)

1. Elizabeth X. Conley says:

Empirical testing is useful because it may uncover unexpected interactions that affect performance. Benchmarks may be used to compare before/after potential improvements to an algorithm after program optimization.