## Euclid, Prime numbers, and the Inclusion-Exclusion Principle

The Euclidean algorithm, the algorithm to compute the greatest common divisor, or $\gcd$, of two numbers, dates back to antiquity (obviously). We can use it to make a fast test for primality, at least up to some confidence—and that’s where the inclusion-exclusion principle comes in.

Let us begin by the Euclidean algorithm. Originally, the algorithm was done by successive subtraction (and becomes quickly tedious as numbers grow), but the modern version, at least the one that I know of, uses divisions and remainders (modulo), and computes the $\gcd(a,b)$ of two numbers $a$ and $b$ in $O(\lg\min(a,b))$ (counting division as an $O(1)$ operation), which is very fast.

In C/C++, a minimal implementation of Euclid’s algorithm looks like:

unsigned gcd(unsigned a, unsigned b)
{
while (b)
{
unsigned t=b;
b=a % b;
a=t;
}

return a;
}


It takes a while to figure out how Euclid’s algorithm works if we don’t approach it from the right angle. Maybe the easiest way of understanding it is recursively. If we have $a=qb+r$, then we also have $\gcd(a,b)=\gcd(b,r)$, because whatever unknown factor $k$ (possibly 1) divides both $a$ and $b$ must also divide $r$.

How do we use that for primality testing? Using $\gcd(a,b)$ we can only test whether numbers $a$ and $b$ share factors. If $\gcd(a,b)>1$, then $a$ and $b$ are composite and share a factor. If they don’t, $\gcd(a,b)=1$, but we can’t conclude any is prime or composite. They can be both composite but not sharing factors, like $2\times{}3\times{}5$ and $7\times{}11\times{}13$. They need further testing.

We can use this to devise a Las Vegas algorithm. A Las Vegas algorithm is an algorithm that either answers “Yes, for sure” (with a proof) to a question, or “I don’t know”. We want an algorithm that answers “Yes, it’s composite” correctly the most often as possible, and answers “I don’t know” as rarely as possible (because further testing is more expensive). The key to get such an algorithm is to choose a good $b$ to test all $a$s.

But how? Well, $b$ has to be a number with the greatest possible number of prime factors. In order words, we must chose $b=2\times{}3\times{}5\times{}7\times\ldots$, and go as far as we can in the series of prime numbers as unsigned (or any other number) we can go. With a 32-bits wide unsigned, we have $b=2\times{3}\times{5}\times{7}\times{11}\times{13}\times{17}\times{19}\times{23}=223092870$. Larger than this, it overflows (adding $29$, we arrive at $6469693230$, larger than $2^{32}-1$).

With $b$ defined, we have a Las Vegas type primality/composite testing algorithm.

*
* *

What about the confidence in the algorithm given by $b$? That’s where the inclusion-exclusion principle comes in. If we have only $b=2$, we clearly have only half of the integer that test correctly—the even ones—and the other half that test inconclusively. If we have $b=2\times{3}$, we do not have $\frac{1}{2}+\frac{1}{3}$ of the numbers that test correctly, but only $\frac{1}{2}+\frac{1}{3}-\frac{1}{6}=\frac{2}{3}$. Why $-\frac{1}{6}$? We counted the numbers that are evenly divided by $2$, evenly divided by $3$, but counted the numbers evenly divided by $6$ twice, we have to remove them once.

For three factors, we already have

$\frac{1}{2}+\frac{1}{3}+\frac{1}{5}-\left(\frac{1}{6}+\frac{1}{10}+\frac{1}{15}\right)+\frac{1}{30}=\frac{22}{30}$.

The method generalizes for any number of factors, but it gets very messy very rapidly.

From the first few ($2$, $2\times{3}$, $2\times{3}\times{5}$) we have that $\frac{1}{2}$, $\frac{2}{3}$, $\frac{22}{30}$ of the numbers tests rapidly as composite. But the last two seem to indicate that the number of numbers that tests correctly does not increase very much when we add more factors. Let’s see:

 factors Portiontestedcorrectly 1 0.5 2 0.666667 3 0.733333 4 0.771429 5 0.792208 6 0.808192 7 0.819475 8 0.828976 9 0.836412

Graphically:

And as we add more and more factors, the accuracy grows (but never reaches one, as to reach one, we should have all primes—impossible, since there’s an infinity of them).

*
* *

So a Las Vegas algorithm to test for primality using this method is bound by the number of primes factors one can manipulate. Using 32 bits integers, we can only get as far as 9 factors before overflowing the product. With 64 bits, we could go much further, but the question is really how does computing $\gcd(a,b)$ compares with just successively testing with $2$, then $3$, then … ?

Let’s find out next time!

### 3 Responses to Euclid, Prime numbers, and the Inclusion-Exclusion Principle

1. Khuram Ali says:

Reblogged this on Khuram Ali.

2. […] Last time we had a look at using the greatest common divisor and Euclid’s algorithm to devise a Las Vegas algorithm for primality testing. We also had a look at how the inclusion exclusion principle helps us determine the proportion of the numbers correctly tested. […]

3. […] in previous installments, we looked at how to use the euclidean algorithm to devise a Las Vegas-type algorithm […]