## Generating Random Sequences (part I)

Every once in a while, we need a random sequence. Whether to test a data structure’s performance or to run probabilistic unit tests, the provided rand primitive from your favorite programming language has several limitations. First, it’s been known for a while that if most implementations of the C standard library rand() function are not very random, despite being “good enough” in great many cases. Second, and more importantly, it does not allow you to easily control the period nor to generate a permutation on $0\ldots n-1$, for example.

There are many methods of generating pseudo-random number sequences. Not all exhibit the same properties and, accordingly, a method may be more useful in one case and perfectly useless in another. High quality pseudo-random number generation is a notoriously hard endeavor, but there is a number of very simple algorithms that will get you out of trouble for certain specific tasks. Let us consider, for example, the example where the pseudo-random generator must generate the numbers in $0\ldots{}n-1$ exactly once, in a random order, of course, in exactly $n$ draws.

Let’s say, then, that for some reason, you have to generate a random-looking permutation of the numbers $0\ldots{}n-1$. The first approach one may think of is to generate an array with the $n$ numbers in it and randomize its contents. The array is then scanned from left to right and all the numbers are drawn one by one.

This approach, although simple, suffers from two important drawbacks. The first is that you must allocate the space for $n$ numbers and initialize the array. This is all cute when $n=100$, but a lot less when $n=2^{32}$ or more: you may not be able to fit the list in memory at all. The second is that after the allocation of the array, you still need to shuffle the list quite well so that it looks random enough. This, in turn, asks for a good pseudo-random number generator and an efficient shuffling algorithm. Even the Fisher-Yates Shuffle asks for $O(n)$ operations. A C99 implementation would look like:

void swap_int(int * a, int * b)
{
int t=*a;
*a=*b;
*b=t;
}

bool fisher_yates(int n, int ** array)
{
(*array)=(int *)malloc(n * sizeof(int));

if (*array)
{
// let us fill the array
for (int i=0;i<n;i++) (*array)[i]=i;

// fisher-yates loop
for (int i=n-1;i;i--)
{
int j=rand() % i; // 0...i-1
swap_int( *array+i, *array+j);
}
return true;
}
return false;
}


This seems nice, but 1) calls on rand() quite a lot and 2) must allocate a size $n$ array of integers. Shrinking the integer type to uint16_t if $n$ is small enough is not much of an optimization. What we need, is a method that generates a permutation without holding such a large internal state. Better yet, we need something that has storage state in $O(\lg n)$ or less.

Turns out that we can do that, provided that we add a bit of initialization time—and not much of it. We chose $n$, the size of the array—which should now be called the length of the sequence. We chose a number $\rho$ that is relatively prime to $n$. That is, $n$ and $\rho$ share no common factors. Having $\rho$ just prime also works, but that’s not necessary to use that strong a condition; relatively prime suffice quite enough. Third, choose $x_0$, the seed, to be in $0\ldots{}n-1$. Maybe initialise it with time() % n or some similar method1.

The sequence is generated by the following iteration:

$x_0 = seed$
$x_t = \left(x_{t-1}+\rho\right)\mod n$

So this algorithm will generate a permutation of $0\ldots{}n-1$ using only $\rho$ and $x_{t-1}$, bothintegers being $O(\lg n)$ bits long.

The C99 implementation of this algorithm would look like:

typedef
struct
{
int x,   // the current state
n,   // the size of the array
rho; // and the relatively prime number
} generator_t;

int get_next( generator_t * gen )
{
return gen->x=(gen->x+gen->rho) % gen->n;
}


and it could be called:

  generator_t gen={4,20,13}; // seed, size, rho

for (int i=0;i<20;i++)
printf(" %d",get_next(&gen));
printf("\n");


producing the most cromulent output:

17 10 3 16 9 2 15 8 1 14 7 0 13 6 19 12 5 18 11 4


*
* *

So, how does this generator works exactly? The fact that $\rho$ is relatively prime to $n$ plays a key role in the generator. The generator has been presented as a recurrence $x_t=(x_{t-1}+\rho)\mod{}n$ but $x_t$ can be rewritten as $x_t = (x_0+t\rho)\mod{}n$, where $x_0$ is the seeded value.

Let us show (with rather much handwaving) that the generator has a period of $n$ and that it covers all the values $0\ldots{}n-1$. Let us consider the seed to be zero for now; the generator becomes $x_t=t\rho\mod{}n$. Since $\rho$ is relatively prime to $n$, the values of $t$ that solve $t\rho\equiv{}0~(\!\!\!\!\mod{}n)$ are multiples of $n$ (including 0). Therefore, you’ll have to iterate $n$ times to fall back on 0; values $1\leqslant{}t all give non-zero residuals. Thus demonstrating that the period of the generator is indeed $n$.

Now, let us show that the $n$ values generated by varying $t$ from 0 to $n-1$ are all distinct. Suppose we have $t$ and $s$, $0\leqslant{}s,t. We can show that if $t\rho\equiv{}s\rho~(\!\!\!\!\mod{}n)$, $s=t$, exactly because $0\leqslant{}s,t\leqslant{}n-1$. Indeed: if $s=t+a$ for some $a$, then $t\rho\equiv{}(t+a)\rho~(\!\!\!\!\mod{}n)$, which implies that $a=0$, so that $s=t$.

If you’re using a non zero seed, you can redo the above adding $x_0$ in the proof; the net resulting being of “rotating” the sequence by an offset given by the seed $(\!\!\!\!\mod{}n)$.

*
* *

The choice of $\rho$ is a difficult one. First, if $\rho\equiv\pm{}1\mod{}n$, the sequence generated is merely a count (if $\equiv{}+1$) from the seed wrapping at $n$ and finishing at the seed minus one, or a decount (if $\equiv{}-1$), which isn’t really random. Same if $\rho\equiv\pm{}2\mod{}n$, except that we’re (de)counting in step of two. For the sequence to look random enough, $\rho$ must be such that $\rho \approx (\phi-1){}n$, where $\phi$ is Phidias‘ constant, the golden ratio. This celebrated constant is given by:

$\phi = 1.6180339887\ldots$

Can you see why? I’ll let you think this one out for a while2 :)

1 Personally, I favor the Linux-specific function gettimeofday that returns the time in seconds and in microseconds. Using the complete time with microsecond gives a rather good time-dependant seed.

2 Or maybe $\displaystyle 1-\frac{1}{\phi}\leqslant{}\rho{}\leqslant{}\phi{}-1$ does quite the job?

### 7 Responses to Generating Random Sequences (part I)

1. […] Points in a Triangle (Generating Random Sequences II) In the first post of this series, I discussed a method to generate a random (guaranteed) permutation. On another […]

2. Farooq says:

The Fisher-Yates method generates each of the n! permutations with equal probability. I see why your method works for generating a permutation of {0, 1, … n-1}, but I don’t see how you can use this method to fairly generate any random sequence.

• Steven Pigeon says:

The method I propose has the significant advantage of not requiring a linear-size storage to generate the permutation. Additionnally, choosing a random seed and a random coprime will let you access a great number of random permutations.

• Farooq says:

Sure. I think it is a good method for what it does, but it is slightly an apples to oranges comparison because it is solving a different problem than Fisher-Yates.

3. […] the first post of this series, I discussed how to generate permutations of sequences using the Fisher-Yates method […]

4. […] a number of previous installment of this series, I’ve discussed permutations, generating uniform random points in a triangle, on a sphere, the inversion method, even recycling […]