## Universal Coding (Part I)

It’s been a while since I last talked about data compression. Let us return to the topic this week with universal codes. Universal codes are entropy-like codes, but, unlike, say, Huffman codes, they presuppose very little on the distribution of the integers to code.

We can use optimally a universal code for a random variable $X$ if

• all integers may show up: $P(X=n)\neq 0$, $\forall n \in \mathbb{N}$,
• and if beyond $n_0$, the distribution is non-increasing, that is, for $n>n_0$ and $m>n_0$, $n, we have $P(X=n)\geqslant P(X=m)$.

We will show how to build an universal code in a moment, but we will first state the condition that makes a code universal if the probability distribution satisfies the above condition. A code with length function $L_u(n)$ (the length of the code for the integer $n$) is $k$-universal if, and only if

$\displaystyle\lim_{n\to\infty}\frac{\sum_{i=1}^n P(X=n)L_u(n)}{\sum_{i=1}^n P(X=n)\log_2 n}=k$

That is, the expectation of the length (in bits) of the code is proportional to the logarithm of the number being coded.

Let’s now see what an universal code looks like

One simple code, proposed by Chaitin in 1966, is to use unary coding to encode the length of the code, followed by the bits of the number itself. For example, the integer 342 would be encoded as

C(342)=0000000001101010110

where the red bits, 9 zeroes followed by a 1, encodes the length, namely, 9. Follows, in green, the binary representation of the number on 9 bits. It is fairly simple to show that this code is 2-universal, since it has code length $2\lceil \lg n\rceil$ for all $n$.

A minor optimization can be afforded when we notice that the most significant bit of the number is almost always 1 (that is, unless $n=0$), so we can merge it with the ending 1 of the unary encoding of the the length. The above would become

C'(342)=000000000101010110

thus saving one bit!

*
* *

The goal, in universal coding, is to get that constant $k$ as close to one as possible. Unfortunately, we cannot get one exactly (unless we allow ourselves to be somewhat hand-wavy wobbybobbly about limits), but we can get very close. Let formalize stuff first.

Let $C_a(n)$ be the unary code for $n$, $n-1$ zeroes followed by a one.

Let $C_b(n)$ be the binary code for $n$ on $\lceil\lg n\rceil$ bits. Let $C_{\hat{b}}(n)$ be the binary code of $n$ with the most significant bit dropped. That is, $C_b(342)=101010110$ and $C_{\hat{b}}(342)=01010110$.

So the first code uses $O(\lceil\lg n\rceil)$ bit to encode, and that’s somewhat inefficient. Can we shorten it significantly? If we use

$C''(n) = C'(\lceil \lg n\rceil)C_{\hat{b}}(n)$

as a code, we now have a length of

$|C''(n)|=|C'(\lceil \lg n\rceil)|+|C_{\hat{b}}(n)|$

$=\left(2\lceil\lg \lceil \lg n\rceil\rceil-1\right)+\left(\lceil \lg n \rceil-1\right)$

$=2\lceil\lg\lceil \lg n\rceil\rceil+\lceil \lg n\rceil-2$

and the limit when $n\to\infty$ relative to $\lg n$ is 1.

*
* *

But we can do even better! It is indeed possible to create codes that have $\lg\lg n + \lg\lg\lg n+\cdots$ bits in the prefix, which is much smaller than $2\lg n$. But let’s keep some for next time.

To be continued…

### One Response to Universal Coding (Part I)

1. […] Last time, we presented universal codes and two simple code. This week, let’s revisit the topic and present Elias codes, which are much better universal codes than Chaitin’s and modified Chaitin codes. […]