The Substitution Cipher

Some time ago, we presented the Caesar Cipher, developed a simple language model that allowed us to break the cipher relatively easily. This week, we will look at (simple) substitution ciphers.

cipher-coin

Superficially, substitution ciphers seem much stronger than Caesar’s
cipher because, rather than just using shifting of the alphabet, it uses an
arbitrary substitution, for example,

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

H W P G S K Z B Q L C O U Y D I X E J F N M A V R T

Which encodes A as H, B as W, etc.

This produces reasonably garbled messages, at least at first sight. The clear text

The Executive Magistrate of the American Union, unmindful of his obligation to execute the laws for the equal benefit of his fellow citizens, has sanctioned a censorship of the press, by which papers incompatible with the compact are excluded from the southern mails, and he has officially advised Congress to do by law, although in violation of the Constitution, what he had himself virtually done already in despite of both.

could be encoded as

hun ndnxqhmcn obrmihvbhn pk hun bonvmxbl qlmpl, qlomlfkqt pk umi pjtmrbhmpl hp ndnxqhn hun tbgi kpv hun nsqbt jnlnkmh pk umi knttpg xmhmynli, ubi iblxhmplnf b xnlipviumz pk hun zvnii, ja gumxu zbznvi mlxpozbhmjtn gmhu hun xpozbxh bvn ndxtqfnf kvpo hun ipqhunvl obmti, blf un ubi pkkmxmbtta bfcminf xplrvnii hp fp ja tbg, bthupqru ml cmptbhmpl pk hun xplihmhqhmpl, gubh un ubf umointk cmvhqbtta fpln btvnbfa ml fnizmhn pk jphu.

for some random substitution. While this looks rather garbled, it takes relatively little work to decipher. First, we have statistical analysis and knowledge about the English language (if we do suspect it’s written in English) that let us guess that “hun” is most likely “the” (since it repeats a number of times), “ubi” is probably “has” or “and”, and that gives us groups of three letters against which we can start. We can, using frequency analysis, tentatively guess other letters, then using pattern matching, and soon recover the message.

*
* *

Implementing a substitution cipher cleanly is a matter of a few lines of code. A C++ implementation looks something like:

class substitution
 {
  private:

  char forward[26];
  char inverse[26];

 public:

  char encode(char x) const
   {
    if (!std::islower(x)) return x;
    return forward[x-'a'];
   }

  char decode(char x) const
   {
    if (!std::islower(x)) return x;
    return inverse[x-'a'];
   }


  substitution(my::random::random_base<unsigned> & r)
   {
    // "If there is no null byte among the first n
    // bytes of src, the string placed in dest will
    // not be null-terminated."
    std::strncpy(forward,"abcdefghijklmnopqrstuvwxyz",26);

    // Fisher-Yates
    for (int i=25;i;i--)
     std::swap(forward[i],forward[r % i]);

    // and the inverse...
    for (int i=0;i<26;i++)
     inverse[forward[i]-'a']='a'+i;
   }
 };

where my::random::random_base & r is a base class for (uniform?) random number generators, and r controls the generation of the substitution (the seed of the generator is the “key”).

As most often, everything happens in the constructor. The first part uses the Fisher-Yates shuffling to generate a random (forward) substitution. The second part, somewhat tricky, computes the inverse, very elegantly so, for ‘a’ to ‘z’.

*
* *

Now, how can we automate the the deciphering of (simple) substitution ciphers? Well, we can use frequency analysis, as hinted earlier, or we can use a language model to help us look for a suitable (inverse) substitution. Maybe we could use genetic programming?

To be continued…

2 Responses to The Substitution Cipher

  1. Anyone with mad substitution-cryptography skills and a love of sci-fi is welcome to attempt my moderately difficult interstellar cryptography puzzle at

    https://thismoonlesssky.wordpress.com/2013/11/05/a-little-game-to-start-with-easy-interstellar-cryptography-puzzle/

  2. […] while ago, we had a look at a (simple) substitution cipher. Recall, this cipher proceeds by, as its name suggests, substituting one letter for another. For […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: