Universal Coding (Part I)

It’s been a while since I last talked about data compression. Let us return to the topic this week with universal codes. Universal codes are entropy-like codes, but, unlike, say, Huffman codes, they presuppose very little on the distribution of the integers to code.


We can use optimally a universal code for a random variable X if

  • all integers may show up: P(X=n)\neq 0, \forall n \in \mathbb{N},
  • and if beyond n_0, the distribution is non-increasing, that is, for n>n_0 and m>n_0, n<m, we have P(X=n)\geqslant P(X=m).

We will show how to build an universal code in a moment, but we will first state the condition that makes a code universal if the probability distribution satisfies the above condition. A code with length function L_u(n) (the length of the code for the integer n) is k-universal if, and only if

\displaystyle\lim_{n\to\infty}\frac{\sum_{i=1}^n P(X=n)L_u(n)}{\sum_{i=1}^n P(X=n)\log_2 n}=k

That is, the expectation of the length (in bits) of the code is proportional to the logarithm of the number being coded.

Let’s now see what an universal code looks like

One simple code, proposed by Chaitin in 1966, is to use unary coding to encode the length of the code, followed by the bits of the number itself. For example, the integer 342 would be encoded as


where the red bits, 9 zeroes followed by a 1, encodes the length, namely, 9. Follows, in green, the binary representation of the number on 9 bits. It is fairly simple to show that this code is 2-universal, since it has code length 2\lceil \lg n\rceil for all n.

A minor optimization can be afforded when we notice that the most significant bit of the number is almost always 1 (that is, unless n=0), so we can merge it with the ending 1 of the unary encoding of the the length. The above would become


thus saving one bit!

* *

The goal, in universal coding, is to get that constant k as close to one as possible. Unfortunately, we cannot get one exactly (unless we allow ourselves to be somewhat hand-wavy wobbybobbly about limits), but we can get very close. Let formalize stuff first.

Let C_a(n) be the unary code for n, n-1 zeroes followed by a one.

Let C_b(n) be the binary code for n on \lceil\lg n\rceil bits. Let C_{\hat{b}}(n) be the binary code of n with the most significant bit dropped. That is, C_b(342)=101010110 and C_{\hat{b}}(342)=01010110.

So the first code uses O(\lceil\lg n\rceil) bit to encode, and that’s somewhat inefficient. Can we shorten it significantly? If we use

C''(n) = C'(\lceil \lg n\rceil)C_{\hat{b}}(n)

as a code, we now have a length of

|C''(n)|=|C'(\lceil \lg n\rceil)|+|C_{\hat{b}}(n)|

=\left(2\lceil\lg \lceil \lg n\rceil\rceil-1\right)+\left(\lceil \lg n \rceil-1\right)

=2\lceil\lg\lceil \lg n\rceil\rceil+\lceil \lg n\rceil-2

and the limit when n\to\infty relative to \lg n is 1.

* *

But we can do even better! It is indeed possible to create codes that have \lg\lg n + \lg\lg\lg n+\cdots bits in the prefix, which is much smaller than 2\lg n. But let’s keep some for next time.

To be continued…

One Response to Universal Coding (Part I)

  1. […] Last time, we presented universal codes and two simple code. This week, let’s revisit the topic and present Elias codes, which are much better universal codes than Chaitin’s and modified Chaitin codes. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: