I am still experimenting with hash functions, and I was toying with the Zobrist hash function that is best known for its use in chess engines. The hash function is conceptually simple: you need a large table of random numbers, indexed, in a chess application, by the position on the board of the piece and by the piece itself. To compute a hash for a whole board configuration, you simply xor all the random numbers together. The hard part is choosing the random numbers.
In the recent panic surrounding the Heartbleed bug, we ask ourselves why, and how, these bugs still happen. We know that it was a preventable bug, with a simple fix, but with potentially important repercussions.
A while ago, we had a look at a (simple) substitution cipher. Recall, this cipher proceeds by, as its name suggests, substituting one letter for another. For the cipher to be reversible, the substitution table is in fact a permutation of the alphabet. This permutation is the cipher “key”.
We surmised that frequency analysis can help us decode a message enciphered with a substitution, but that does not help us with atypical or very short messages. What if the message is somewhat skewed, say it doesn’t even have the letter E? Can we still use the (expected) letter frequencies to start decoding it? Well, maybe. Could we use some other approach? What if we had a language model that helps us decide if a tentative deciphering makes sense? Oh, right, we do have one. And since we can use it as a “fitness” function, we can apply genetic programming to find the code!
Superficially, substitution ciphers seem much stronger than Caesar’s
cipher because, rather than just using shifting of the alphabet, it uses an
arbitrary substitution, for example,
In the last installment of this series, we looked at Markov chains as a mean of estimating the likelihood of a given piece of text of actually being a message, written in English, rather than mere gibberish.
This week, we finally piece everything together to obtain a program to crack Caesar’s cipher without (much) human intervention.
While it’s ridiculously easy to break the cipher, even with pen-and-paper techniques, we ended up, last time, surmising that we should be able to crack the cipher automatically, without human intervention, if only we had a reasonable language model. This week, let us have a look at how we could build a very simple language model that does just that.
Caesar’s method can hardly be considered anything close to secure, but it’s still worthwhile to have a look at how you can implement it, and break it, mostly because it’s one of the simplest substitution ciphers.