Breaking Caesar’s Cipher (part III)

23/04/2013

In the last installment of this series, we looked at Markov chains as a mean of estimating the likelihood of a given piece of text of actually being a message, written in English, rather than mere gibberish.

This week, we finally piece everything together to obtain a program to crack Caesar’s cipher without (much) human intervention.

Read the rest of this entry »


Breaking Caesar’s Cipher (Caesar’s Cipher, part II)

16/04/2013

In the last installment of this series, we had a look at Caesar’s cipher, an absurdly simple encryption technique where the symmetric encryption only consists in shifting symbols k places.

markov-chains

While it’s ridiculously easy to break the cipher, even with pen-and-paper techniques, we ended up, last time, surmising that we should be able to crack the cipher automatically, without human intervention, if only we had a reasonable language model. This week, let us have a look at how we could build a very simple language model that does just that.

Read the rest of this entry »


Building a Tree from a List in Linear Time (II)

09/04/2013

Quite a while ago, I proposed a linear time algorithm to construct trees from sorted lists. The algorithm relied on the segregation of data and internal nodes. This meant that for a list of n data items, 2n-1 nodes were allocated (but only n contained data; the n-1 others just contained pointers.

wood

While segregating structure and data makes sense in some cases (say, the index resides in memory but the leaves/data reside on disk), I found the solution somewhat unsatisfactory (but not unacceptable). So I gave the problem a little more thinking and I arrived at an algorithm that produces a tree with optimal average depth, with data in every node, in linear time and using at most \Theta(\lg n) extra memory.

Read the rest of this entry »


Caesar’s Cipher

02/04/2013

Julius Caesar, presumably sometimes during the war in Gaul, according to Suetonius, used a simple cipher to ensure the privacy of his communications.

cipher-coin

Caesar’s method can hardly be considered anything close to secure, but it’s still worthwhile to have a look at how you can implement it, and break it, mostly because it’s one of the simplest substitution ciphers.

Read the rest of this entry »


A Special Case…

26/03/2013

Expressions with floors and ceilings (\lfloor x \rfloor and \lceil y \rceil) are usually troublesome to work with. There are cases where you can essentially remove them by a change of variables.

stairs

Turns out, one form that regularly comes up in my calculations is \lfloor \lg x \rfloor, and it bugged me a while before I figured out the right way of getting rid of them (sometimes).

Read the rest of this entry »


Compressed Series (Part II)

12/03/2013

Last week we looked at an alternative series to compute e, and this week we will have a look at the computation of e^x. The usual series we learn in calculus textbook is given by

\displaystyle e^x=\sum_{n=0}^\infty \frac{x^n}{n!}

We can factorize the expression as

Read the rest of this entry »


Compressed Series (Part I)

05/03/2013

Numerical methods are generally rather hard to get right because of error propagation due to the limited precision of floats (or even doubles). This seems to be especially true with methods involving series, where a usually large number of ever diminishing terms must added to attain something that looks like convergence. Even fairly simple sequences such as

\displaystyle e=\sum_{n=0}^\infty \frac{1}{n!}

may be complex to evaluate. First, n! is cumbersome, and \displaystyle \frac{1}{n!} becomes small exceedingly rapidly.

Read the rest of this entry »


Suggested Reading: Cryptography: The science of secret writing

16/02/2013

Laurence Dwight Smith —Cryptography: The science of secret writing — Dover, 1943, 164 pp. ISBN 0-486-20247-X

(Buy at Amazon.com)

(Buy at Amazon.com)

Read the rest of this entry »


Damn you, Napier!

15/01/2013

Briggs‘ logarithms (often mistakenly understood to be Napier‘s logarithms) is such an useful function that most of us don’t really think about it, until we have to. Everyone’s familiar with its properties:

\displaystyle\log_b a = \frac{\log a}{\log b}

\log b^x = x \log b

\log a b = \log a + \log b (1)

\displaystyle\log \frac{a}{b} = \log a - \log b

but suddenly,

\log (a+b) = ?

What can we do with this last one?

Read the rest of this entry »


Stemming

10/07/2012

A few weeks ago, I went to Québec Ouvert Hackathon 3.3, and I was most interested by Michael Mulley’s Open Parliament. One possible addition to the project is to use cross-referencing of entries based not only on the parliament-supplied subject tags but also on the content of the text itself.

One possibility is to learn embeddings on bags of words but on stemmed words to reduce the dimensionality of the one-hot vector, essentially a bitmap where the bit corresponding to a word is set to 1 if it appears in the bag of words. So, let us start at the beginning, stemming.

Read the rest of this entry »