Suppose you have been sub-contracted by a video game company, say Nintendo, to come up with an low power, low-memory, implementation of a popular video game, say, an installment of the infamous Pokémon series. The envisaged device is very slow (it has a 2MHz processor) but has a 220×176 pixels color screen capable of displaying 32768 different colors. It also has a hardware sprite mode, but sprites are limited to 4 distinct colors. You have to encode the complete database of Pokémons, which are several hundreds, in as little space as possible. Each image vary somewhat in size, but can only have 4 distinct colors from the possible palette of 32768 colors.
A typical Pokémon would look like:
Since the artists put a lot of efforts into drawing the critters in 4 colors, and because of the hardware requirements, you cannot use a compression algorithm that modifies or creates more colors than the original image had, therefore excluding JPEG. Moreover, the JPEG codec would be too complex to port to the platform considered. You must therefore use a simple lossless codec.
We will be comparing 5 codecs for the Pokémon image database. We will be looking at a “block” codec, which will serve as a comparison for a non-RLE codec. We will be looking at 5 variations on the theme of run length encoding, or RLE, namely, the All Counts, the Bit Select, and Auto-Trigger RLE. The last variant will help Auto-Trigger RLE by using predictors, transforming the input image into one that is easier to compress using RLE.
First, let us describe the physical properties of the images. In the images, there are only four colors drawn from a possible set of 32768. For each image, still assuming lossless coding, we will have 45 bits dedicated for the color palette. One could think that four 15-bits color would mean 60 bits total, but one can be removed (coded implicitly) since one of the colors is always white; therefore only 45 bits are necessary. The physical dimensions of the images vary but never exceeds 64×64. Assuming a -1 bias (so that sizes 1 to 64 are encoded using 0 to 63), only 12 bits are necessary to encode the image sizes. So, whatever the compression used, we have 12+45=57 bits of overhead before encoding the image itself.
The Block Compressor. Let us suppose we’re just starting to tackle problem of compressing the Pokémon images. We look at the images and see that there are either long runs of repeating pixels (same colors), or runs of pixels of different values because more colors are simulated by the use of hand-crafted dithering. You also notice that around the Pokémon’s edges, you have similar transitions. So after thinking a while, you devise the following code:
Code Pixels 0 00 --> 00 00 00 00 0 01 --> 01 01 01 01 0 10 --> 10 10 10 10 0 11 --> 11 11 11 11 10 --> repeats the last seen pixel 2 times 11 + xx yy zz ww -> copy the 4 pixels (xx,yy,zz,ww)
So runs of 4 repeating pixels gets encoded into 3 bits instead of 8, two repeating pixels are encoded in two, and finally, a random set of 4 pixels gets encoded onto 10 bits. This scheme will serve as the reference naïve codec.
RLE. The reference codec isn’t very good at capturing arbitrary lengths of repeating pixels. Maybe we can do better. What could we do? Couldn’t we rather encode arbitrary repetitions? Oh. Wait. That’s what Run Length Encoding, or RLE, is about! But there are many possible flavors of RLE.
All counts RLE. The simplest, and unsurprisingly the least efficient variant, encodes just any pixel using a count and pixel pair. If there’s a run of exactly one pixel long, we encode the pair (1,p), where p is the pixel value. If we have 5 pixels with the same value in a row, we encode the pair (5,p), and so on. We can introduce a -1 bias to encode one repetition on 0, 2 on 1, 3 on 2, etc. If we use a fixed number of bits to encode the run length, that allows to encode the range to on the range to .
The Bit Select RLE. Since it is inefficient to encode a long series of different pixels by as many pairs as there are pixels, one can consider an encoding where a bit indicates whether the pair encodes a repeat or a string of literal pixels. A series of literal pixels is simply a series of pixels to be copied as is. The encoding would then take the form (1,rep,p) for pixels repeated rep+1 times, and (0,rep,p1,p2,…pn) for rep different pixels.
The Auto-Trigger RLE. Rather than allocating bits to repetitions on every pixels, we might be better off to detect repetitions in another way. Let’s say we rather encode pixels one by one until a repetition is detected. A repetition is detected whenever consecutive pixels are the same. The optimal value of is determined by the source to encode. For the Pokémon image database, the optimal . So after two repeating pixels, a repeat code is emitted. If the repetitions continue, the number of remaining pixels in that run is encoded, otherwise it’s 0. It’s “0 more repetitions” if the next pixel breaks the repeating sequence.
Variable Length Codes. Fixed-length codes are probably not very efficient to encode run lengths. After looking that the distribution of runs, I found that the following variable length code (or VLC) is nearly optimal:
Code Lengths 00 0 01 1 100 + 1 bit 2,3 101 + 2 bits 4,...,7 110 + 3 bits 8,...,15 1110 + 4 bits 16,...,31 1111 + 5 bits 32,...,63
And those will be the variable length codes for the run encoding in all implementations of RLE (except for the Block coder, which isn’t RLE at all).
However, the results from the first four variants aren’t all that impressive. Starting at 2 bits per pixels in the original image, the block coder yields a 1.38:1 compression ratio, the All Counts fares similarly with a 1.34:1 compression ratio, the Bit Select is an absolute catastrophe with an insignificant compression ratio of 1.02:1. Finally, the Auto-Trigger RLE yields 1.71:1 compression, which is by far the best amongst the more naïve methods.
Predictive RLE. This method is considerably more complex than the methods presented so far (while being still rather simple in absolute terms). The previous methods are limited by the repetitions found in the pixels. If there’s a regular pattern (as there is often in dithering), the RLE compressor cannot take advantage of it. However, we can transform the original image so that the RLE compressor is able to form much longer runs and therefore yield much better compression.
To do so, we will use prediction. In order to guess what will be the next pixel in the image, we will form contexts and accumulate frequencies of occurrence of pixels given the contexts. The context is defined as the pixel right above and the pixel to the immediate left of the pixel we want to predict, as the following illustrates:
For the sake of bootstrapping the process, the pixels that fall outside the image are considered as the color 0.
For each context (which are numbered 16), we will maintain four counts, one for each color. While we scan the entire image, we form the contexts (c1,c2) and we increment the count corresponding to the color observed in the ‘?’ pixel. Once the image is scanned, we have all context populated and we proceed to the second phase.
In the second phase, we establish, for each context, color ranking. Color ranking assigns, to each color in a given context, a rank: the most frequent color receives rank 1, the second most frequent rank 2, and so on. Once rankings are assigned for each colors in each context, we discard the frequency information as we won’t be needing it again.
In the third phase, we will rescan the image and replace the pixels by their ranking given their context. This will help RLE compress longer series of pixels as, even they were different in the original image, their rank might be the same given their contexts. Once all pixels are replaced by their ranking, we apply Auto-Trigger RLE with an auto-repeat length of 2.
This method requires the coding of the rankings, at a cost of 10 bytes (we have 16 context, with 4 rankings, which can be encoded in at most 5 bits per permutation, since there are 24 possible permutations of 1,2,3,4).
The decompression proceeds in reverse: the data is decompressed using Auto-trigger RLE, then one by one the rankings are replaced by colors, by using the ranking to color mapping encoded in the contexts.
Comparing Results. It is finally the predicted values that will be encoded using Auto-Trigger RLE, yielding a much better compression ratio of 1.9:1, a gain of about 12% on the unaided Auto-Trigger RLE which yields a 1.7:1 compression ratio. Bit Select, All Counts, and Block perform so much worse than they can be discarded from the list of potential compression algorithms.
Comparing relative performances yields:
which clearly indicates the predictive codec as a winner. The results are averaged over the 746 test set Pokémon bitmaps.
In this post, we have seen two things. First, we have seen that Auto-Trigger RLE is the best variant of RLE as it does not use special codes for strings of non-repeating pixels, and only for pixels that do actually repeat. The optimal value of repetitions after which a repetition code is emitted depends on the source to code. In the case of the Pokémon database, gave the best results (it was determined by exhaustive testing).
Second, we have seen that transforming the data before compression can actually yield much better results. In this example, we transformed from pixel-space to ranking-space, which yielded longer repetitions than the original data, with very little extra processing cost.
On compression ratio fairness. The compression ratio presented are relative to the original image at two bits per pixels. However, if I had measured the compression ratio according to the original pnm files, I could have claimed a much higher ratio. Indeed, in the original pnm file, each pixel is 24 bits long (3 components, R, G, B, each 8 bits wide), therefore I could easily have claimed a 22.8:1 compression ratio using predictive RLE coding! Which would be dishonnest, I think, even though it would give better compression ratios that Zip on that particular series of files.
Compression ratios should always be expressed relative to the smallest non-compressed representation of the data. In our case, the Pokémons are in fact palette bitmaps, and each pixel has a natural representation on two bits, relative to a four entry palette of 15 bits colors. Therefore, all compression ratios are reported relative to the original bitmap at two bits per pixels.
This also raises the question of the original dimensionality of the data. Not only one has to ask himself how many bits are necessary for the minimal natural representation of the data (in our example, 2 bits per pixels plus a palette describe adequately the data) but also how much it “fills” its possible space. Speaking of fractal dimension may be exagerated, and not quite applicable, but one has to figure out what patterns are possible (and then merely likely) in the data and how this influences the possible compression algorithms and obtained compression ratios. In our case, the Pokémon images share numerous properties: vast blocks of uniform colors; edges formed by white pixels followed by a single dark pixel followed by the fill color, simulated colors by regular dithering (checkboards patterns of two colors), etc. Clearly, the Pokémons do not occupy a large portion of all possible bitmaps of 4 colors. Unless there’s a Whitenoisamon, the images have a high degree of structuration, and that is clearly exploitable for compression.