Eratosuite

The other day in class, we were looking at recurrence relations and how to solve them with the characteristic equation. Among the examples I gave was a recurrence that seemed to give a good number of primes numbers. Are there many of those? How do we find them?

Well, we need two things: a method of testing if a given number is prime, and a method of generating recurrences—parameters and initial conditions. Well, make that three: we also need to generate the suite generated by the recurrence relation. Let’s go!

Let us first take time to recall what a recurrence relation is, and, more specifically, what a linear homogeneous recurrence is. A recurrence relation is a recursive expression for the terms of a sequence, in the form of

$t_n=a_1 t_{n_1}+a_2 t_{n-2}+\cdots+ a_k t_{n-k}$,

where the $a_i$ are the coefficients (not necessarily natural numbers), and the $t_i$ are the $i$th term in the sequence. The best known such recurrence relation is Fibonacci’s series:

$F_1=1$,

$F_2=1$,

$F_n=F_{n-1}+F_{n-2}$.

Fibonacci’s series defines a term using the two previous terms, therefore its order is 2. If it’d defined the terms using, say the $k$ previous terms, then its order would be $k$. For a recurrence relation of order $k$, you will need $k$ initial conditions, that is, terms that are at the beginning of the sequence and aren’t defined relative to the others. If we’d put $F_1=3$ and $F_2=17$, say, then Fibonacci’s recurrence relation generates an entirely different sequence.

Let’s look at another example. Say:

$t_1=1$,

$t_2=2$,

$t_n=2t_{n-1}+3t_{n-2}$.

The first two terms are the initial conditions, so the sequences starts 1, 2. The third term is $t_3=2t_2+3t_1=2\times{2}+3\times{1}=7$. The fourth term is $t_4=2t_3+3t_2=2\times{7}+3\times{2}=20$. And so on, infinitely.

*
* *

OK, so what interests me is finding series that give, potentially, a lot of primes numbers. We need a way of checking the terms for primality. But we also know that testing for primality is hard in general, so we’ll settle for some trade-off. One possible trade-off is to establish a boolean table where t[i] is true when i is prime and false otherwise. Fortunately, we know of a super efficient way of filling such a table!

The method is called Eratosthenes’ sieve (or some other minor variant such as the sieve of Eratosthenes). The method is simple. We start with a table where all the entries are true (except for entry zero and one). Then we find the next entry that is true. It’ll be a prime number. We then cross-out all that is a multiple of that number (except the number itself, evidently). We then find the next entry that is still true, that’ll be our next prime number. We then cross out all of its multiples from the table. We repeat until we reach numbers about the size of the square root of the table: at that point, we’re done, all entries still true are prime numbers.

A C++ implementation would be something like this:

void eratosthenes(std::vector<bool> & primes, const options_t & options)
{
primes=std::vector<bool>(options.nb_primes,true); // safe up to c++11
primes[0]=primes[1]=false;

size_t current=2;
while (current*current<primes.size())
{
// mark all multiples of current
for (size_t mark=2*current; mark<primes.size(); mark+=current)
primes[mark]=false;

// find the next 'true'
do
{
current++;
}
while (current<primes.size() && !primes[current]);
}
}


Let us note the use of the “option pattern”, where a configuration is passed throughout the program using a reference to a structure (or class) holding the configuration. This avoids passing a lot of parameters to each call, especially when non mission-specific items are considered. It’s not conceptually the problem of the sieve to know if we’re in debug/verbose mode or not, so we might as well hide all these details into an
options_t structure. That “option pattern” isn’t found in the GoF,
and is neither state nor memento, although
both come close, kind of.

*
* *

To search for interesting recurrences, we must search for combinations of initial conditions and coefficients. We could generate them randomly, and after a large enough number stop and pick the best sequences, those that have the most prime numbers in them. We could also try exhaustively all combinations. This suggests nested loops, were, for all possibilities for the first initial condition we try all possibilities for the second, and for all possibilities for the second, the third, then for all possibilities of initial conditions, we try all possibilities for the first coefficients, and so on, and so on.

If we have a fixed depth, say two initial conditions and two coefficients, we end up with four nested loops. If we want to put one more term in the recurrence, we must now have six nested loops, three for the initial conditions and three for the coefficients. This approach is most inconvenient because we must have special cases for every possible recurrence order. We must find something more general… and, if possible, not too painful to code!

Well, we can look at the initial conditions and at the coefficients, as counters. If we limit the initial conditions values to 0 to 9, say, then all possible combinations will be like counting from 00 to 99: 00, 01, 02, …, 08, 09, 10,… We will take the same approach but count “digits” from a lower to a upper bound rather than from 0 to 9. But we will proceed in the same way. We start at the lower bound, and increment the first value until it exceeds the upper bound. It is then reset to the lower bound, and a “carry” is pushed to the next value that is incremented. If the next value exceeds its upper bound, it is also reset and a carry is propagated to the next. That’s exactly what we do when we do +1 on a (base 10) number:if the first digit becomes 10, we write 0, and propagate the carry to the next. If the carry is zero (no carry is pushed) then we’re done.

The other nice thing with this approach is that we don’t really need to know before hand how many combinations there are. When a wrap-around occurs, the counter reverts to its initial configuration, with all “digits” being set at their lower bounds. It suffices then to do +1 until we come back to “all zeros”.

The following piece of code implements the counter:

void start_counter(std::vector<int64_t> & counter,
const std::vector<std::pair<int64_t,int64_t>> & bounds)
{
counter.resize(bounds.size());
for (size_t i=0;i<bounds.size();i++)
counter[i]=bounds[i].first;
}

////////////////////////////////////////
void increment(std::vector<int64_t> & counter,
const std::vector<std::pair<int64_t,int64_t>> & bounds)
{
int64_t c=1;
size_t i=0;
while (c && (i<bounds.size()))
{
counter[i]+=c;
if (counter[i]>bounds[i].second)
{
counter[i]=bounds[i].first;
i++;
}
else
c=0;
}
}


*
* *

Once we’re able to generate all possible initial conditions and coefficient combinations, we need a way of generating the series, and evaluate its score. If it has too few different values it’s bad, it doesn’t have many primes, it’s bad. Something like:

int64_t dot( const std::vector<int64_t> & a,
const std::vector<int64_t> & b )
{
// computes dot product of two vectors
int64_t s=0;
for (size_t i=0;i<a.size();i++)
s+=a[i]*b[i];
return s;
}

////////////////////////////////////////
size_t eval_series( const std::vector<int64_t> & initial_conditions,
const std::vector<int64_t> & parameters,
const std::vector<bool> & primes,
const options_t & options)
{
std::vector<int64_t> terms(initial_conditions);
std::vector<int64_t> state=initial_conditions;

for (size_t i=state.size();i<options.nb_terms;i++)
{
int64_t v=dot(state,parameters);
state.erase(state.begin());
state.push_back(v);
terms.push_back(v);
}

// checks

// very few values?
std::set<uint64_t> s(terms.begin(),terms.end());
if (s.size() < terms.size()/4)
return 0;

// count primes
size_t p=0;
for (int64_t x:s)
p+=is_prime(x,primes);

return p;
}


*
* *

All that’s left is to explore exhaustively all combinations, evaluate all series, and pick the best (at least as measured by our naïve score function). That would give something like:

void search( const std::vector<bool> primes,
const options_t & options)
{
std::vector<int64_t> initial_conditions, first_initial_conditions;
std::vector<int64_t> parameters, first_parameters;

std::vector<int64_t> best_initial_conditions;
std::vector<int64_t> best_parameters;
size_t best_score=0;

start_counter(first_initial_conditions,options.initial_conditions_bounds);
start_counter(first_parameters,options.parameter_bounds);

initial_conditions=first_initial_conditions;
do
{
parameters=first_parameters;
do
{
if (options.verbose)
show_series(initial_conditions,
parameters,
primes,
options);

size_t this_score=eval_series(initial_conditions,
parameters,
primes,
options);

if (this_score>best_score)
{
best_score=this_score;
best_initial_conditions=initial_conditions;
best_parameters=parameters;
}

increment(parameters,options.parameter_bounds);
}
while (parameters!=first_parameters);

increment(initial_conditions,options.initial_conditions_bounds);
}
while (initial_conditions!=first_initial_conditions);

show_series(best_initial_conditions,
best_parameters,
primes,
options);
}


Running the program produces the output:

.../eratostuite> eratosuite -p 10000  --bounds -6,6,-6,6,-6,6 -f 50
Score: 33/50
initial conditions: -5 5 1
parameters        : -1 1 1
-5 5 1 11 7 17 13 23 19 29 25 35 31 41 37 47 43 53 49 59 55 65 61 71 67 77 73 83 79 89 85 95 91 101 97 107 103 113 109 119 115 125 121 131 127 137 133 143 139 149


I have used Boost’s program_options framework to parse the arguments. The -p options specifies how large the prime look-up table will be, and the --bounds specifies, for each coefficient, the range of value to be searched (for now, I consider them as integer). I also put an option to define the bounds for the initial conditions, but if it’s not specified, then the bounds for the coefficients are assumed.

*
* *

The main problem with this program is limited precision arithmetic. I used int64_t to use 64-bits integers, but some series will grow very fast and, due to limited precision, give erroneous terms. The next step would be to replace int64_t by some arbitrary-precision integer object. In turn, it will likely make testing for prime numbers a lot more difficult.

Another problem with the counter-based approach is that it does not lend itself that easily to parallelism. Using threads (say, Boost’s), we would need to schedule sequences to be checked in parallel, first generating a batch of configurations from the counters, then dispatching. Using OpenMP would not make things a lot easier.