The anatomy of hash functions (Hash functions, part II)

Before we go on exploring hash functions for look-up, let’s discuss their basic anatomy. This will give us some vocabulary as well as help us identify what are the important characteristics of good hash functions.

the-anatomy-lesson-small

The general anatomy of hash function is composed of four principal parts, as shown in the block diagram here below.

hash-block-diagram

The hash value, denoted h in the diagram (and h_t as the output), is the value computed so far by the hash function. It receives some initial value, h_0, and is successively modified by each symbol of the input.

The input symbol, shown as x_t in the diagram, is the t-th part of the input, fed at time t to the function. In turn, in causes the hash value to change, yielding h_t.

The combination step combines the hash value h_{t-1} with the next input symbol x_t. This operation can be as simple as adding the symbol to the current hash value; but it’s preferably an operation whose result is much harder to predict.

The confusion step mixes up (not necessarily a permutation) the hash value yielded by the combination step. This step is necessary to mitigate any inefficiencies the combination step may exhibit. If you would have a perfect combination step, the confusion step would be omitted, but, as you might have guessed, the combination step is rarely that perfect and some additional help is needed to produce strong, random-looking, values.

*
* *

A good hash function must exhibit confusion and diffusion. Diffusion (in Shannon’s sense) is observed when a single bit changed in the input affects many bits in the output. Confusion in Shannon’s sense (not the same as the figure above) is when an output bit depends on several input bits. If your hash function exhibits both properties, then it will be robust.

*
* *

The choice of the initial hash value, h_0, may not be as critical as one might think at first. It may seem that choosing a good, random-looking initial value will kick-start the function and help produce stronger outputs, but it really only depends on the hash function itself. If the hash function hash good enough combination and confusion steps, the effect of the initial value—often taken to be zero anyway—will be lessened because the two steps randomize the value sufficiently given input symbols. If, on the contrary, the hash function is really bad (say, a simple sum of the input symbols), then the initial value will only introduce a constant bias to the output values, and not make them a lot more random.

One Response to The anatomy of hash functions (Hash functions, part II)

  1. […] simple functions won’t do a good job at hashing. Recall, the hash function must have both good combination and confusion steps. Addition alone isn’t […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: