## Making a good random table

I am still experimenting with hash functions, and I was toying with the Zobrist hash function[1] that is best known for its use in chess engines. The hash function is conceptually simple: you need a large table of random numbers, indexed, in a chess application, by the position on the board of the piece and by the piece itself. To compute a hash for a whole board configuration, you simply xor all the random numbers together. The hard part is choosing the random numbers.

Indeed, computing the hash function, once we do have a table, is
straightforward:

```hash_uint_t zobrist(const char buffer[], size_t n)
{
#include "zobrist.table"

// necessary so that we're immune to char being
// signed char, which is implementation-specific
//
const unsigned char * u=(const unsigned char*)buffer;

hash_uint_t h=0;
for (size_t i=0;i<n;i++)
h^=zobrist_numbers[i % zobrist_blocks][u[i]];

return h;
}
```

In the above, zobrist_blocks is defined so to corresponds to the number of “squares”… it would be 64 in a chess game. Here, it can be defined to whatever you want, but it should be longer than your average hashed object. In general, however, object size is unconstrained, and so you have to choose some reasonable trade-off. Maybe 16. Or 32. Your pick.

Now, how do we generate good random numbers? If you’re expecting a 64 bits value, they also should be 64 bits random numbers. Also, the more random they are, the better. So that kind of excludes the basic std::rand() function that isn’t necessarily 64 bits on your system, and isn’t very random either. So, what other source of random numbers do we have? Well, if you’re using a Linux box: /dev/urandom.

The /dev/urandom pseudo-device is used as a file and contains (or rather spews) random bits, generated from a strong generator. They’re crypto-grade bits. The only thing left is to exploit it to build our table. A simple Bash script is ideally suited for the task:

```#!/usr/bin/env bash

hd /dev/urandom | \
(
echo static const size_t zobrist_blocks=16\;
echo static uint64_t zobrist_numbers[zobrist_blocks][256]=
echo {
for ((offset=0;offset<16;offset++))
do
echo " {"
for ((row=0;row<32;row++))
do
echo -n "  "
for ((col=0;col<4;col++))
do
#echo \$line
a=\$(echo \$line | cut -d\  -f 2-9 | sed s/\ //g )
b=\$(echo \$line | cut -d\  -f 10-17 | sed s/\ //g )
echo -n 0x\$a,0x\$b,
done
echo
done
echo " },"
done
echo }\;
)
```

The script used hd (hex display) to convert the 8-bit binary output of /dev/urandom. This output is parsed so to extract columns 2 to 9 for the first 64-bits value, and 10 to 17 for the second one. If you’d rather have a 32 bits table, they you’d have to cut the line differently, for columns 2-5, 6-9, 10-13 and 14-17. The sed simply removes spaces. The rest of the script is pretty much pretty-printing so that it’s not a horrible mess to read:

```static const size_t zobrist_blocks=16;
static uint64_t zobrist_numbers[zobrist_blocks][256]=
{
{
0x07aa62721d78e86b,0xf1ef92ff361fb42f,0x4dcb4c38b28876ea,0x8e8281d7b798c831,0x5c00eb300ab1f9ce,0x4d3bb8046164f468,0x0b8e67923e43a964,0x352135fa09865d4e,
0xc8da78692e2f7d4f,0x2d3ef72c2ef6ac91,0x1048a35bafb842c7,0x17ac0a45fe84dbd2,0x1e85fe3ea6ba2cf2,0xc615639008199795,0x7c611ae764bb1e42,0x4eb161b231fdab9b,
0xb0133a01e6bff340,0xc0203f64d4cebf2d,0x42c6864cd34c58e1,0x9a06ccc4b555e336,0x80c7e26a7a4e7017,0x8acda6eb63ed62c3,0x601f258a40bee12e,0x1a0c35b55ec0832f,
...
```

[1] Albert L. Zobrist — A New Hashing Method with Application for Game Playing — Tech. Rep. #88, Computer science Department, University of Wisconsin (1970)