## Euclid, Primes numbers, and a Bit of Algorithm Analysis (Part IV)

While we looked at using the GCD for devising a Las Vegas-type algorithm to test for the primality of a number in a previous entry, we saw that actually testing the number against a list of primes (therefore assuming we have an arbitrarily long list of primes), one prime at a time, is much faster.

The GCD method, for divisor $d$, tests a number in $O(1+\frac{1}{2}\log_2 d)$ steps, but we haven’t determined the complexity of the other method. Let’s do that now.

Let’s recapitulate: The expected complexity (upper bounded by) is:

$C_n=\displaystyle 1+ (1-\frac{1}{2})\left(1+(1-\frac{1}{3})\left(1+(1-\frac{1}{5})\left(1+(1-\frac{1}{7})\left(1+\cdots\right)\right)\right)\right)$,

with $n$ terms. How does it work? The algorithm does one iteration in all cases, with one factor. With two factors, it checks the first, and there’s a chance that it doesn’t stop there because the test failed; it continues another iteration. The probability that it continues after testing 2 is $(1-\frac{1}{2})$, because half of the numbers caused the algorithm to stop at the first iteration. Of the $(1-\frac{1}{2})$ remaining, an iteration cost of at least 1 is incurred (testing the next factor) and there’s a probability (compound, that’s why it’s a product form) of $(1-\frac{1}{3})$ that 3 did not divide it; yielding yet another iteration with probability $(1-\frac{1}{2})(1-\frac{1}{3})$ incurring cost of at least 1 to test for five, with the probability of $(1-\frac{1}{5})$ that it continues.

Written as above, the complexity $C_n$ is rather cumbersome. Let’s rewrite it:

$C_n=\displaystyle 1+ (1-\frac{1}{2}) (1+(1-\frac{1}{3}) (1+(1-\frac{1}{5}) (1+(1-\frac{1}{7}) (1+ \cdots$

$=\displaystyle 1+\frac{1}{2}(1+\frac{2}{3}(1+\frac{4}{5}(1+\frac{6}{7}(1+\frac{10}{11}(1+\frac{12}{13}(1+\cdots$

$=\displaystyle 1+\frac{1}{2}+\frac{2}{6}+\frac{8}{30}+\frac{48}{210}+\frac{480}{2310}+\frac{5760}{30030}+\cdots$

Note that the fractions are not reduced, that is, you have $\frac{2}{6}$ instead of $\frac{1}{3}$. At first, I had the reflex of simplifying the fractions too, but it sometimes hinders insight. In this form, we can use the numerators $1,1,2,8,48,480,\ldots$ in Sloane’s On-Line Encyclopedia of Integer Sequences to find that it’s A005867. Doing the same with the denominators, we find that they’re A002110, the primorials.

Therefore the numerators $\{a_i\}_{i=1}^n$ are given by:

$\displaystyle a_i=\prod_{j=1}{i}(\pi_j-1)$,

where $\pi_j$ is the jth prime number, starting at $\pi_1=2$. The series $\{b_i\}_{i=1}^n$ of denominators, the primorials are given by:

$\displaystyle b_i=\prod_{j=1}^i \pi_j$.

Finally,

$\displaystyle C_n = \sum_{i=1}^n \frac{a_i}{b_i} =\sum_{i=1}^n \frac{\prod_{j=1}^i (\pi_j-1)}{\prod_{j=1}^i \pi_j}$.

This expression is correct but it’s unwieldy. If we want to approximate it, we can notice (well, after some work, it just doesn’t show up by itself!) that:

$\displaystyle \frac{\prod_{i=1}^k(\pi_i-1)}{\prod_{i=1}^k \pi_i} \approx \frac{k^{k-1}}{k^{k+\frac{1}{\sqrt{2}}}}=k^{-1-\frac{1}{\sqrt{2}}}$,

and, using that result, we have

$\displaystyle C_n \approx \sum_{i=1}^n i^{-1-\frac{1}{\sqrt{2}}} = H_n^{1-\frac{1}{\sqrt{2}}}$,

where $H_n^{1-\frac{1}{\sqrt{2}}}$ is the nth generalized harmonic number of order $1-\frac{1}{\sqrt{2}}$.

*
* *

The GCD-based algorithm is $O(\lg d)$ where $d$ is the primorial we use for testing. The naïve algorithm is $O(\lg n)$, the index of the primorial! It’s basically a $O(\lg \lg d)$ algorithm when we measure both against $d$. That’s why it’s faster.

*
* *

Above, I made you notice that I did not use simplified fractions, despite it being too often my first reflex to do so. Simplifying prematurely may yield very strange looking series, and prevent you from finding anything useful in Sloane’s Encyclopedia. Because chances are, someone before you managed to create that same series—maybe in an entirely different context—and added it to the database. And it’s especially useful when you work with primes, a notoriously difficult subject.