Channel Mixing and Pseudo-Inverses

29/12/2009

Let’s say we want to mix three channels onto two because the communication device has only two available channels but we still want to emulate a three channel link. If we can afford coding, then it’s not a problem because we can build our own protocol so add any number of channels using a structured data stream. But what if we cannot control the channel coding at all? In CDs, for example, there’s no coding: there are two channels encoded in PCM and a standard CD player wouldn’t understand the sound if it was encoded otherwise.

The solution is to mix the three channels in a quasi-reversible way, and in a way that the two channels can be listened to without much interference. One possible way is to mix the third channel is to use a phase-dependant encoding. Early “quadraphonic” audio systems did something quite similar. You can also use a plain time-domain “mixing matrix” to mix the three channels onto two. Quite expygeously, let us choose the matrix:

M=\left[~\begin{array}{ccc} \frac{2}{3} &0&\frac{1}{3}\\ 0 &\frac{2}{3}&\frac{1}{3}\end{array}~\right]

Read the rest of this entry »


The Frivolous Theorem of Arithmetic

24/11/2009

There’s a theorem that, although its formulation is trivial, is of paramount importance in many things, including data compression. I’m talking about the frivolous theorem of arithmetic, of course. The theorem takes many forms, but one being:

Almost all natural numbers are very, very, very large.

026-cat-sleeping-01-smaller

The converse implies that there are a lot more big numbers than there are smaller numbers. Of course, this is trivially self-evident. But this trivial theorem can serve as a brutal reality check for many hypotheses. For example, one can use the frivolous theorem of arithmetic to disprove the existence of a lossless data compression method that compresses all inputs to smaller bit strings.

Read the rest of this entry »


Ad Hoc Compression Methods: RLE (part II)

15/04/2009

Yestermorning I presented an ad hoc compression method known as RLE, or run-length encoding and I got quite more reactions than I thought I would. Many suggested doing plenty of other stuff despite the fact that the post was about RLE, not about <insert your favorite compression method>. But something very nice happened: people got interested and many suggestions came forth.

I decided to implement the median-based prediction method as suggested by Dark Shikari here. Let us see the results!

Read the rest of this entry »


Ad Hoc Compression Methods: RLE

14/04/2009

Suppose you have been sub-contracted by a video game company, say Nintendo, to come up with an low power, low-memory, implementation of a popular video game, say, an installment of the infamous Pokémon series. The envisaged device is very slow (it has a 2MHz processor) but has a 220×176 pixels color screen capable of displaying 32768 different colors. It also has a hardware sprite mode, but sprites are limited to 4 distinct colors. You have to encode the complete database of Pokémons, which are several hundreds, in as little space as possible. Each image vary somewhat in size, but can only have 4 distinct colors from the possible palette of 32768 colors.

A typical Pokémon would look like:

A typical Pokémon image, shown enlarged and in original size.

A typical Pokémon image, shown enlarged and in original size.


Since the artists put a lot of efforts into drawing the critters in 4 colors, and because of the hardware requirements, you cannot use a compression algorithm that modifies or creates more colors than the original image had, therefore excluding JPEG. Moreover, the JPEG codec would be too complex to port to the platform considered. You must therefore use a simple lossless codec.

We will be comparing 5 codecs for the Pokémon image database. We will be looking at a “block” codec, which will serve as a comparison for a non-RLE codec. We will be looking at 5 variations on the theme of run length encoding, or RLE, namely, the All Counts, the Bit Select, and Auto-Trigger RLE. The last variant will help Auto-Trigger RLE by using predictors, transforming the input image into one that is easier to compress using RLE.

Read the rest of this entry »


Ad Hoc Compression Methods: Move To Front

03/03/2009

In a previous post, I’ve presented an ad hoc compression method known as digram (digraph, bigram, bigraph) coding that coded pairs of symbols on used codes from the original alphabet yielding a very simple encode/decode and also sufficiently good compression (in the vicinity of 1.5:1 or so). While clearly not on par with state-of-the-art codecs such as Bzip2 or PAQ8P, this simple codec can still buy you extra space, even when using a very slow/simple processor.

This week, I’ll introduce you to another simple algorithm, Move To Front coding.

Read the rest of this entry »


Ad Hoc Compression Methods: Digram Coding

17/02/2009

Ad hoc compression algorithms always sound somewhat flimsy, but they still have their usefulness whenever the data you have to compress exhibits very special properties or is extremely short. Universal compression algorithms need a somewhat lengthy input to adapt sufficiently to yield good compression. Moreover, you may not have the computing power (CPU speed, RAM and program space) to implement full-blown compressors. In this case also, an ad hoc algorithm may be useful.

Digram (bigram, digraph, bigraph) coding is one of those ad hoc algorithms. Digram coding will compress text (or any source, really) by finding pairs of symbols that appear often and assigning them codes corresponding to used symbols, if any.

hieroglyph

Let us see in detail what digram compression is, and how effective it really is.

Read the rest of this entry »


The 10 (classes of) Algorithms Every Programmer Must Know About

23/12/2008

In Tunnels of Doom!, I wrote that the disjoint sets algorithm is one of the very few algorithms every programmer should know. That got me thinking. Should? What about must? If everyone must know about disjoint sets, what other algorithms must every programmer know about?

I made a “top ten” list of algorithms and data structures every programmer must know about.

Read the rest of this entry »


Tunnels of Doom!

16/12/2008

Æons ago, that is, in September 1982, I got my first true computer, the Texas Instrument TI-99/4A. A truly incredible machine, especially considering the era. Its CPU, a true 16 bits processor, the TMS 9900, ran at 3MHz. Even the popular TRS-80 Color Computer, Radio Shack’s 6809-based computer, ran at a mere 0.895MHz, or, if the gods were good to you, you could tweak the clock divisor and crank it up to 1.79MHz.

In addition to being my first programmable computer, the TI-99/4A sported truly nice graphics and a relatively advanced sound chip—for the era—and a number of games were developed for it. Hunt the Wumpus, Parsec, and others. But the one game that really got me interested in programming is Kevin Kenney’s Tunnels of Doom, one of the very first graphic dungeon crawler games. While not being nostalgic by nature, I returned to that game about a year ago following an unusual route. I was looking into procedural content generation, especially map generation, in relation to data compression and I remembered that the old game’s dungeons were generated randomly, yet were very well structured. I decided to get a look into it, and see if I could generate maps like his.

Read the rest of this entry »


Length of Phase-in Codes

23/09/2008

In a previous post, I presented the phase-in codes. I stated that they were very efficient given the distribution is uniform. I also stated that, given the distribution is uniform, they never do worse than \approx 0.086 bits worse than the theoretical lower bound of \lg N, where N is the number of distinct symbols to code and \lg N is a convenient shorthand for \log_2. In this post, I derive that result.

Read the rest of this entry »


Stretch Codes

09/09/2008

About ten years ago, I was working on a project that needed to log lots of information and the database format that was chosen then was Microsoft Access. The decision seemed reasonable since the data would be exported to other applications and the people who would process the data lived in a Microsoft-centric world, using Excel and Access (and VBA) on a daily basis. However, we soon ran into a major problem: Access does not allow a single file to be larger than 2GB.

After sifting through the basically random error messages that had nothing to do with the real problem, we isolated the bug as being reaching the maximum file size. “Damn that’s retarded!” I remember thinking. “This is the year 2000! If we don’t have flying cars, can we at least have databases larger than 2GB!?“. It’s not like 2GB was a file size set far off into the future as are exabytes hard-drives today. There were 20-something GB drives available back then, so the limitation made no sense whatsoever to me—and still doesn’t. After the initial shock, I got thinking about why there was such a limitation, what bizarre design decision lead to it.

Read the rest of this entry »