The anatomy of hash functions (Hash functions, part II)

Before we go on exploring hash functions for look-up, let’s discuss their basic anatomy. This will give us some vocabulary as well as help us identify what are the important characteristics of good hash functions.

The general anatomy of hash function is composed of four principal parts, as shown in the block diagram here below.

The hash value, denoted in the diagram (and as the output), is the value computed so far by the hash function. It receives some initial value, , and is successively modified by each symbol of the input.

The input symbol, shown as in the diagram, is the -th part of the input, fed at time to the function. In turn, in causes the hash value to change, yielding .

The combination step combines the hash value with the next input symbol . This operation can be as simple as adding the symbol to the current hash value; but it’s preferably an operation whose result is much harder to predict.

The confusion step mixes up (not necessarily a permutation) the hash value yielded by the combination step. This step is necessary to mitigate any inefficiencies the combination step may exhibit. If you would have a perfect combination step, the confusion step would be omitted, but, as you might have guessed, the combination step is rarely that perfect and some additional help is needed to produce strong, random-looking, values.

* * *

A good hash function must exhibit confusion and diffusion. Diffusion (in Shannon’s sense) is observed when a single bit changed in the input affects many bits in the output. Confusion in Shannon’s sense (not the same as the figure above) is when an output bit depends on several input bits. If your hash function exhibits both properties, then it will be robust.

* * *

The choice of the initial hash value, , may not be as critical as one might think at first. It may seem that choosing a good, random-looking initial value will kick-start the function and help produce stronger outputs, but it really only depends on the hash function itself. If the hash function hash good enough combination and confusion steps, the effect of the initial value—often taken to be zero anyway—will be lessened because the two steps randomize the value sufficiently given input symbols. If, on the contrary, the hash function is really bad (say, a simple sum of the input symbols), then the initial value will only introduce a constant bias to the output values, and not make them a lot more random.

This entry was posted on Tuesday, October 6th, 2015 at 18:54 pm and is filed under algorithms, data structures. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.

One Response to The anatomy of hash functions (Hash functions, part II)

[…] simple functions won’t do a good job at hashing. Recall, the hash function must have both good combination and confusion steps. Addition alone isn’t […]

RT @PastMedHistory: Doramad Radioactive Toothpaste was produced in Germany by Auergesellschaft of Berlin from the 1920s through World War I… 8 hours ago

[…] simple functions won’t do a good job at hashing. Recall, the hash function must have both good combination and confusion steps. Addition alone isn’t […]