It’s been a while since I last talked about data compression. Let us return to the topic this week with universal codes. Universal codes are entropy-like codes, but, unlike, say, Huffman codes, they presuppose very little on the distribution of the integers to code.
We can use optimally a universal code for a random variable if
- all integers may show up: , ,
- and if beyond , the distribution is non-increasing, that is, for and , , we have .
We will show how to build an universal code in a moment, but we will first state the condition that makes a code universal if the probability distribution satisfies the above condition. A code with length function (the length of the code for the integer ) is -universal if, and only if
That is, the expectation of the length (in bits) of the code is proportional to the logarithm of the number being coded.
Let’s now see what an universal code looks like
One simple code, proposed by Chaitin in 1966, is to use unary coding to encode the length of the code, followed by the bits of the number itself. For example, the integer 342 would be encoded as
where the red bits, 9 zeroes followed by a 1, encodes the length, namely, 9. Follows, in green, the binary representation of the number on 9 bits. It is fairly simple to show that this code is 2-universal, since it has code length for all .
A minor optimization can be afforded when we notice that the most significant bit of the number is almost always 1 (that is, unless ), so we can merge it with the ending 1 of the unary encoding of the the length. The above would become
thus saving one bit!
The goal, in universal coding, is to get that constant as close to one as possible. Unfortunately, we cannot get one exactly (unless we allow ourselves to be somewhat hand-wavy wobbybobbly about limits), but we can get very close. Let formalize stuff first.
Let be the unary code for , zeroes followed by a one.
Let be the binary code for on bits. Let be the binary code of with the most significant bit dropped. That is, and .
So the first code uses bit to encode, and that’s somewhat inefficient. Can we shorten it significantly? If we use
as a code, we now have a length of
and the limit when relative to is 1.
But we can do even better! It is indeed possible to create codes that have bits in the prefix, which is much smaller than . But let’s keep some for next time.
To be continued…