Suppose you have been sub-contracted by a video game company, say Nintendo, to come up with an low power, low-memory, implementation of a popular video game, say, an installment of the infamous Pokémon series. The envisaged device is very slow (it has a 2MHz processor) but has a 220×176 pixels color screen capable of displaying 32768 different colors. It also has a hardware sprite mode, but sprites are limited to 4 distinct colors. You have to encode the complete database of Pokémons, which are several hundreds, in as little space as possible. Each image vary somewhat in size, but can only have 4 distinct colors from the possible palette of 32768 colors.

A typical Pokémon would look like:

A typical Pokémon image, shown enlarged and in original size.

Since the artists put a lot of efforts into drawing the critters in 4 colors, and because of the hardware requirements, you cannot use a compression algorithm that modifies or creates more colors than the original image had, therefore excluding JPEG. Moreover, the JPEG codec would be too complex to port to the platform considered. You must therefore use a simple lossless codec.

We will be comparing 5 codecs for the Pokémon image database. We will be looking at a “block” codec, which will serve as a comparison for a non-RLE codec. We will be looking at 5 variations on the theme of run length encoding, or RLE, namely, the All Counts, the Bit Select, and Auto-Trigger RLE. The last variant will help Auto-Trigger RLE by using predictors, transforming the input image into one that is easier to compress using RLE.

First, let us describe the physical properties of the images. In the images, there are only four colors drawn from a possible set of 32768. For each image, still assuming lossless coding, we will have 45 bits dedicated for the color palette. One could think that four 15-bits color would mean 60 bits total, but one can be removed (coded implicitly) since one of the colors is always white; therefore only 45 bits are necessary. The physical dimensions of the images vary but never exceeds 64×64. Assuming a -1 bias (so that sizes 1 to 64 are encoded using 0 to 63), only 12 bits are necessary to encode the image sizes. So, whatever the compression used, we have 12+45=57 bits of overhead before encoding the image itself.

The Block Compressor. Let us suppose we’re just starting to tackle problem of compressing the Pokémon images. We look at the images and see that there are either long runs of repeating pixels (same colors), or runs of pixels of different values because more colors are simulated by the use of hand-crafted dithering. You also notice that around the Pokémon’s edges, you have similar transitions. So after thinking a while, you devise the following code:

```Code     Pixels
0 00 --> 00 00 00 00
0 01 --> 01 01 01 01
0 10 --> 10 10 10 10
0 11 --> 11 11 11 11

10   --> repeats the last seen pixel 2 times
11 + xx yy zz ww -> copy the 4 pixels (xx,yy,zz,ww)
```

So runs of 4 repeating pixels gets encoded into 3 bits instead of 8, two repeating pixels are encoded in two, and finally, a random set of 4 pixels gets encoded onto 10 bits. This scheme will serve as the reference naïve codec.

RLE. The reference codec isn’t very good at capturing arbitrary lengths of repeating pixels. Maybe we can do better. What could we do? Couldn’t we rather encode arbitrary repetitions? Oh. Wait. That’s what Run Length Encoding, or RLE, is about! But there are many possible flavors of RLE.

All counts RLE. The simplest, and unsurprisingly the least efficient variant, encodes just any pixel using a count and pixel pair. If there’s a run of exactly one pixel long, we encode the pair (1,p), where p is the pixel value. If we have 5 pixels with the same value in a row, we encode the pair (5,p), and so on. We can introduce a -1 bias to encode one repetition on 0, 2 on 1, 3 on 2, etc. If we use a fixed number of bits to encode the run length, that allows to encode the range $1$ to $2^n$ on the range $0$ to $2^n-1$.

The Bit Select RLE. Since it is inefficient to encode a long series of different pixels by as many pairs as there are pixels, one can consider an encoding where a bit indicates whether the pair encodes a repeat or a string of literal pixels. A series of literal pixels is simply a series of pixels to be copied as is. The encoding would then take the form (1,rep,p) for pixels repeated rep+1 times, and (0,rep,p1,p2,…pn) for rep different pixels.

The Auto-Trigger RLE. Rather than allocating bits to repetitions on every pixels, we might be better off to detect repetitions in another way. Let’s say we rather encode pixels one by one until a repetition is detected. A repetition is detected whenever $a$ consecutive pixels are the same. The optimal value of $a$ is determined by the source to encode. For the Pokémon image database, the optimal $a=2$. So after two repeating pixels, a repeat code is emitted. If the repetitions continue, the number of remaining pixels in that run is encoded, otherwise it’s 0. It’s “0 more repetitions” if the next pixel breaks the repeating sequence.

Variable Length Codes. Fixed-length codes are probably not very efficient to encode run lengths. After looking that the distribution of runs, I found that the following variable length code (or VLC) is nearly optimal:

```Code           Lengths
00             0
01             1
100  + 1 bit   2,3
101  + 2 bits  4,...,7
110  + 3 bits  8,...,15
1110 + 4 bits  16,...,31
1111 + 5 bits  32,...,63
```

And those will be the variable length codes for the run encoding in all implementations of RLE (except for the Block coder, which isn’t RLE at all).

However, the results from the first four variants aren’t all that impressive. Starting at 2 bits per pixels in the original image, the block coder yields a 1.38:1 compression ratio, the All Counts fares similarly with a 1.34:1 compression ratio, the Bit Select is an absolute catastrophe with an insignificant compression ratio of 1.02:1. Finally, the Auto-Trigger RLE yields 1.71:1 compression, which is by far the best amongst the more naïve methods.

Predictive RLE. This method is considerably more complex than the methods presented so far (while being still rather simple in absolute terms). The previous methods are limited by the repetitions found in the pixels. If there’s a regular pattern (as there is often in dithering), the RLE compressor cannot take advantage of it. However, we can transform the original image so that the RLE compressor is able to form much longer runs and therefore yield much better compression.

To do so, we will use prediction. In order to guess what will be the next pixel in the image, we will form contexts and accumulate frequencies of occurrence of pixels given the contexts. The context is defined as the pixel right above and the pixel to the immediate left of the pixel we want to predict, as the following illustrates:

For the sake of bootstrapping the process, the pixels that fall outside the image are considered as the color 0.

For each context (which are numbered 16), we will maintain four counts, one for each color. While we scan the entire image, we form the contexts (c1,c2) and we increment the count corresponding to the color observed in the ‘?’ pixel. Once the image is scanned, we have all context populated and we proceed to the second phase.

In the second phase, we establish, for each context, color ranking. Color ranking assigns, to each color in a given context, a rank: the most frequent color receives rank 1, the second most frequent rank 2, and so on. Once rankings are assigned for each colors in each context, we discard the frequency information as we won’t be needing it again.

In the third phase, we will rescan the image and replace the pixels by their ranking given their context. This will help RLE compress longer series of pixels as, even they were different in the original image, their rank might be the same given their contexts. Once all pixels are replaced by their ranking, we apply Auto-Trigger RLE with an auto-repeat length of 2.

This method requires the coding of the rankings, at a cost of 10 bytes (we have 16 context, with 4 rankings, which can be encoded in at most 5 bits per permutation, since there are 24 possible permutations of 1,2,3,4).

The decompression proceeds in reverse: the data is decompressed using Auto-trigger RLE, then one by one the rankings are replaced by colors, by using the ranking to color mapping encoded in the contexts.

Comparing Results. It is finally the predicted values that will be encoded using Auto-Trigger RLE, yielding a much better compression ratio of 1.9:1, a gain of about 12% on the unaided Auto-Trigger RLE which yields a 1.7:1 compression ratio. Bit Select, All Counts, and Block perform so much worse than they can be discarded from the list of potential compression algorithms.

Comparing relative performances yields:

The compression methods compared (bigger is better).

which clearly indicates the predictive codec as a winner. The results are averaged over the 746 test set Pokémon bitmaps.

*
* *

In this post, we have seen two things. First, we have seen that Auto-Trigger RLE is the best variant of RLE as it does not use special codes for strings of non-repeating pixels, and only for pixels that do actually repeat. The optimal value $a$ of repetitions after which a repetition code is emitted depends on the source to code. In the case of the Pokémon database, $a=2$ gave the best results (it was determined by exhaustive testing).

Second, we have seen that transforming the data before compression can actually yield much better results. In this example, we transformed from pixel-space to ranking-space, which yielded longer repetitions than the original data, with very little extra processing cost.

*
* *

On compression ratio fairness. The compression ratio presented are relative to the original image at two bits per pixels. However, if I had measured the compression ratio according to the original pnm files, I could have claimed a much higher ratio. Indeed, in the original pnm file, each pixel is 24 bits long (3 components, R, G, B, each 8 bits wide), therefore I could easily have claimed a 22.8:1 compression ratio using predictive RLE coding! Which would be dishonnest, I think, even though it would give better compression ratios that Zip on that particular series of files.

Compression ratios should always be expressed relative to the smallest non-compressed representation of the data. In our case, the Pokémons are in fact palette bitmaps, and each pixel has a natural representation on two bits, relative to a four entry palette of 15 bits colors. Therefore, all compression ratios are reported relative to the original bitmap at two bits per pixels.

*
* *

This also raises the question of the original dimensionality of the data. Not only one has to ask himself how many bits are necessary for the minimal natural representation of the data (in our example, 2 bits per pixels plus a palette describe adequately the data) but also how much it “fills” its possible space. Speaking of fractal dimension may be exagerated, and not quite applicable, but one has to figure out what patterns are possible (and then merely likely) in the data and how this influences the possible compression algorithms and obtained compression ratios. In our case, the Pokémon images share numerous properties: vast blocks of uniform colors; edges formed by white pixels followed by a single dark pixel followed by the fill color, simulated colors by regular dithering (checkboards patterns of two colors), etc. Clearly, the Pokémons do not occupy a large portion of all possible bitmaps of 4 colors. Unless there’s a Whitenoisamon, the images have a high degree of structuration, and that is clearly exploitable for compression.

32 Responses to Ad Hoc Compression Methods: RLE

1. Andrew Shuttlewood says:

Just a random approach – how well do the different approaches perform on planar graphics? Would this perhaps yield better compression – especially if you were able to try and limit often changing gradients to occur mostly in one plane by changing the order of the palette?

• Steven Pigeon says:

I guess you would still need some kind of predictive coding to make sure the bits in a given plane repeat as much as possible, regardless of whether or not you use RLE or some “real” compression algorithm such as a LZ variant.

2. josh says:

Why would you bother with all these RLE variations before trying LZSS?

• Steven Pigeon says:

Because even if you use LZSS (which is not quite state of the art either) you would still have to transform the bitmap into a predicted bitmap that increases the numbers of repeats. For example, the predictive coding of the last variation would boost LZSS quite a lot because now the predicted bitmaps have longer strings of repeating symbols.

That and the fact that the post is about RLE, not LZSS ;)

I suppose it would be fun to implement a bit-wise version of LZSS (or LZW or… )

• josh says:

LZSS isn’t state of the art, but decompression is only trivially more complex than for RLE and it has built-in linear prediction in the form of a dictionary. (RLE is LZSS with a fixed offset of 1.)

I suspect that all this bit-fiddling you need to make RLE respectable would make decompression too slow to be useful. Decompression is not the only thing the game needs to do, not even the most important thing. Not to mention that compression of the images is offset by the size of the decompressor.

As with bare RLE, LZSS can be implemented without using a bit stream, which increases speed and reduces code size.

• Pino says:

LZSS works well on the Super NES, Game Boy Advance, and other platforms with comparatively plentiful work RAM. But the problem with LZSS is that it requires continuous access to the previously decompressed data. On 8-bit platforms with little work RAM (e.g. 2 KiB for NES, 8 KiB for Game Boy) and single-ported video RAM, you don’t always have that luxury. Instead, you need to decompress a few tiles and upload them to the video chip during vertical blanking so that you have room in work RAM to decompress a few more tiles.

3. Jon Harrop says:

This is a very interesting challenge! You could probably have done a lot better by posting the data and question as a challenge on the internet. ;-)

What compression ratios do GIF and PNG achieve when you put all 746 images in one large 4-color image?

• Steven Pigeon says:

Zipping -9 the images in a big zip file yield about 11.6:1 compression (the method I propose yields 22.8:1 if we start from the pnm files). Zip doesn’t do very well because it does not know what it is compressing exactly. PNG-ing (-9) them yields only 7.3:1 compression. GIF yields 15.5:1 compression. So the method I propose isn’t all that dismal ;)

(if you look hard enough with google, you can find the data.)

4. plexluthor says:

Are you familiar with the Burrows-Wheeler transform? I’m not sure it would work on the target CPU, since it’s not linear in memory or time, but it tends to make any RLE algorithm work better. I’d be interested in seeing how BWT helps/hurts the performance of the more naive algorithms.

There’s a Dr. Dobbs article here:

http://marknelson.us/1996/09/01/bwt/

• Steven Pigeon says:

Yes, I am. Well, the “target CPU” is all very much hypothetical, and you could assume that you have at least enough memory to compute the F and L sequences at least for a small block of the image, say, maybe 128 pixels. Or 512. Or maybe the whole image. What I do not know is how long the window should be to hit a “sweet spot” that yields maximal compression.

I may add the idea to my to-do list. That’s actually a good suggestion.

5. damien says:

More than 20 years ago, I implemented a compression/decompression algorithm for the Nintendo NES. Same restrictions, slow processor, low memory.

The algorithm I implemented was called LZB, which used no more memory than the decompression buffer. Essentially, the compressed stream was an interleaved set of literal bytes and references to previously decompressed strings.

To make things go faster, you keep those two parts in separate streams – a stream of literal byte strings, and a stream of references to previously decompressed strings.

I think you would find that LZB would way outperform RLE in compression ratio and potentially also speed.

Your transform to context-ranked data would be just as applicable to either algorithm.

• Steven Pigeon says:

I will not ask for source as it is most cleary covered by some NDA, but more details would be welcome.

Got hard numbers on this claim? I’d be interested to see how a LZ-77 (or 78?) type algorithm would perform on those. Zip, GIF, and PNG (with -9) do not do very well because the picture is small (short data) compared to what it takes to really kick in an yield massive compression ratio. That’s also something you have to consider.

Speed, maybe, but I think that’d would be mostly an implementation issue, whether or not you happen you have efficient instructions to decode VLC and fill arrays with a given value.

• damien says:

I don’t have the source, as the code was written around 20 years ago at a company that no longer exists. I can tell you that the code I wrote was in use for at least 10 years, compressing all kinds of data in console games, including sprite data, and very fast decompression was achieved.

Decompression is an amazingly simple algorithm. Compression is somewhat more complex and slow, but for this kind of purpose it doesnt matter.

Its relatively easy to build this algorithm from scratch. In your case you might build it to operate on units of 2 bits rather than bytes – or chunk a 2×2 pixel area into a byte. All that’s needed are some variable length codes – golomb codes and their ilk.

I would suggest doing the 2×2 pixel chunking and then running the result through some standard compression algorithms – it will give you a good idea of what is achievable.

There a paper here that mentioned LZB: http://nzcsrsc08.canterbury.ac.nz/research/reports/TechReps/1989/tr_8906.pdf

• Steven Pigeon says:

As I replied to someone else, the standard algorithms zip, png, gif, even with “compress harder” settings do not get into the 20:1 and above compression ratio (relative to the original 24-bit RGB bitmap) at all. I somehow doubt that if zip, GIF or PNG do not fare all that well (while not really bad either), and they use good algorithms, a simple scheme with golomb codes (while the offsets may or may not be distributed geometrically) would fare so much better.

However, that TR is intresting. On page 21, it shows the performance of LZB against the window size. As the image is a bitmap (0 or 1) we see that it just barely makes it under the 1:1 compression ratio, at 0.95-something bits per pixel.

The problem with a lot of compressions algorithms is that when you natural data size gets smaller (two bits per pixels, in our case), the pay-off diminishes quite rapidy because the codes are large compared to the original data size. If you have 4 bits for offset and a 6 bits for length (as you would have in a LZ77 type algorithm) you have to find a match that’s 5 or more pixels long to just break even or get compression. Using VLC you may get a break even point of, say, 4 pixels. You’re still very far from 2:1 compression just yet (compared to the original data size at 2 bits per pixels)

• damien says:

As you are working with 2-bit pixels, it makes sense that byte-oriented compressors tend to do badly. Computing the compression ratio relative to 24-bit data doesn’t make much sense though.

A dictionary approach tailored to 2-bit pixels, however, should always work better than any possible RLE approach.

You seem to have acquired a lot of interest in your blog entry. You should put up your data and let people have a go at it with their different techniques. Otherwise, all that will ensue is a bunch of hand waving.

• Steven Pigeon says:

That’s already what GIF does. It uses LZW on an alphabet size of 4 (two bits per pixels). And I’ve shown it doesn’t fare all that well.

6. Steven Pigeon says:

Gwern, from Reddit points to:

http://en.wikipedia.org/wiki/Missingno#Codes_and_glitches

“Unless there’s a Whitenoisamon, the images have a high degree of structuration, and that is clearly exploitable for compression.”

Well, I’m sure we all remember Missingno.!

7. […] Ad Hoc Compression Methods: RLE Suppose you have been sub-contracted by a video game company, say Nintendo, to come up with an low power, low-memory, im […] […]

8. Dark Shikari says:

Another idea is to perform FFV1-like median pixel prediction:

Prediction = Median3( Top, Left, Top + Left – TopLeft)

And then pick as the actual predicted pixel the *closest pixel value in the palette*.

I suspect this would beat your prediction method, but I have never tried median pred on a palette.

• Steven Pigeon says:

It would mispredict at edges and in dithering quite badly. At slanted egdes:

w b
b ?

where w=white,b=black, it predicts black, the median of (black, black, black+black-white), whether or not black+black-white is black or 1/3 (dark) gray. That’s also the case with dithering, where you have checkboards patterns that looks exactly like the slanted edge. Dithering is quite expensive to code, so you don’t want to systematically mispredict those.

In flat edges, it may work:

w b
w ?

it would predict med(white,black, white+black-white) which would be black.

b b
w ?

it would predict med(white,black,black+white-black) = white.

If it predicts correctly, you’re in business. What happens if you predict incorrectly? Well, you still have to encode the error. In the “continuous” domain (where you have plenty of bits for each color), you emit a delta code. If the error is small on average compared to the original resolution, you get some compression.

Now, on a palette, “wrong” gives you a delta of as many bits are there are in the pixels because the palette may not be ordered in any smart way. The delta code is therefore something like a bit for guessed right/wrong, followed by 0 or 2 bits to indicate the correct entry? What would you suggest?

My method will generate runs even if it’s wrong, because it will generates runs of “second guesses” that are RLE-compressible. Using the median predictor (that I guess works pretty well on 24 bits or grayscale images), you have no natural order for your “second guess”

• Dark Shikari says:

You’d be surprised how good median is; it is the best known spatial-domain pixel predictor short of LPCs and neural networks. There’s a slightly better one that uses 5 median values instead of 3 as well, but it’s pretty much the same concept.

I wasn’t suggesting you change your method of encoding the error–your method is fine (delta codes based on “second best”, etc)–I was suggesting changing the prediction.

If you’re not too pressed for speed you could also implement a simple order-zero arithmetic coder; this would vastly simplify handling of runs and so forth at the cost of a few dozen clocks per binary decision.

• Steven Pigeon says:

The method to encode the error is that there is none! I encode the predicted values’ rank.

Yes, I would guess you are right about how good the median is (probably still much less so than, say, CALIC or some other hierarchical prediction method of the same type). I would guess it leaves residuals nicely distributed as a Laplacian distribution, or similar, that have efficient codes. But the median method assume that you work with “continuous” values. The palette-based bitmap has no such continuity.

• Dark Shikari says:

On that note, any chance you could post a copy of the original sprite to test on? I could whip up a simple entropy coder and prediction backend and see if I can break 2x compression.

• damien says:

its likely a 2mhz 6502 or z80 processor.

what are the time constraints on this problem?

• Steven Pigeon says:

As I said in another reply, it’s hypothetical.

• Pino says:

Game Boy has a 4 MHz Z80-derived CPU made by Sharp. It has less than a second to decompress the image.

• Steven Pigeon says:

No, because I do not know where they are from; showing one is certainly fair use. The whole database probably comes from a ROM snapshot or something. I got it from one of those web sites specialized in sprites. Google will help you find it.

9. Steven Pigeon says:

They can be found at http://veekun.com/dex/resources

10. Pino says:

Bee 52 and other games made by Codemasters for the NES use a codec similar in concept to what is called “predictive RLE” above. It operates on a basis of 8×8 pixel tiles made of 8×1 pixel slivers. Each sliver can repeat the sliver above it, or it can be encoded with context from the pixel to its left. See tokumaru’s explanation.

• Steven Pigeon says:

Are there “standard” compression libraries for that kind of platform safe from, say, LZO or LZSS? The ones that everybody uses or is it mainly that every studio has its set of home-made compression?

11. […] tile level; and stack the results. His idea was to use “all counts” RLE (as discussed here) using fixed codes, something like a byte for the run count, and a byte to introduce the tile type. […]