Euclid and Primality Testing (III)

So in previous installments, we looked at how to use the euclidean algorithm to devise a Las Vegas-type algorithm for primality testing. We also found that, in general, simply testing factors one after the other is much more efficient (but that doesn’t mean that there are not better algorithms to test for primality!).


We also considered only relatively small primorials (the products of the first n prime numbers) since they rapidly exceeded 2^{32}. But just how fast do primorials grow?

The first step is to define the primorials, P_n (some use P_n\#, which I find most horrible). P_n is defined as

P_n=\displaystyle\prod_{k=1}^n p_k

with p_k being the kth prime number.

The second step is to compute primorials, gather data. You either use a library such as GMP, or use a system like Mathematica. I opted for Sage, a Python-like language that offers arbitrary-precision integers and sports a very well adorned library of mathematical primitives. The Sage program to generate the list of primorials goes like this:

# generator for primes
x=(p for p in Primes())

# actually generate a list of primes
l=[ for i in range(150) ]

# RDF = approximation of Read to DoubleFloat (or something like that)
p=[ RDF(log(prod(l[:i]))) for i in range(1,len(l)+1) ]

f = file('output.txt','w')

As you can see, Sage is very Python-like (it has its differences but if you’re OK with Python you should be OK with the code above): first, we create a generator object that iterates over the list of all primes; second, we instantiate a short list (150 primes, we explain why a bit later on), then the log (base-e) of the primorials.

The third step is to plot the data and analyze it. We get:


We have the blue curve, the \log_2 P_n, essentially equivalent to the number of bits necessary to store P_n. The green line is n (since, according to theory, we should have that \log P_n \sim n, which is more or less true, at a constant) and the red line is 8.5n-120 which locally fits \log_2 P_n pretty well.

* *

So what does the graph tell us? That the number of bits necessary to use P_n (the product of the n first primes) is proportional to 8.5n-120 when n in in the range 40 to 100 (it’s a local fit, but reasonable since we would not have an infinite number of primes anyway). Remember, we wanted to use \gcd(a,P_n) to assess, to some degree of certainty, whether a is prime or not. We also have shown before that adding primes only augmented the Las Vegas-type algorithm accuracy slowly, and that therefore we would need a very large number of primes, which, as we just shown, makes P_n far too big for machine-size integers.

Using a large integer library is not terribly inconvenient; but we already know that testing primes one by one is, paradoxically, much faster. To test with certainty a number m, we need only to test primes up to O(\sqrt{m}). Indeed, if the number m is composite, say m=ab for integer a and b, we either have a>\sqrt{m} and therefore b\leqslant\sqrt{m}, or we have b>\sqrt{m} and therefore a\leqslant\sqrt{m}. Either way, the first factor to be found is at most O(\sqrt{m}); and the expected number of tests is roughly proportional to the mth harmonic number—I’ll show this in a few weeks. On the other hand, P_{\sqrt{m}} is a rather large number, even for not-so-big m.

So, I’m now pretty convinced \gcd isn’t that helpful for testing primality.

One Response to Euclid and Primality Testing (III)

  1. Elizabeth X. Conley says:

    Empirical testing is useful because it may uncover unexpected interactions that affect performance. Benchmarks may be used to compare before/after potential improvements to an algorithm after program optimization.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: