Building a Balanced Tree From a List in Linear Time

03/01/2012

The usual way of forming a search tree from a list is to scan the list and insert each of its element, one by one, into the tree, leading to a(n expected) run-time of O(n \lg n).

However, if the list is sorted (in ascending order, say) and the tree is not one of the self-balancing varieties, insertion is O(n^2), because the “tree” created by the successive insertions of sorted key is in fact a degenerate tree, a list. So, what if the list is already sorted and don’t really want to have a self-balancing tree? Well, it turns out that you can build a(n almost perfectly) balanced tree in O(n).

Read the rest of this entry »


Fractional Bits (Part I)

01/11/2011

Some time ago, I discussed Huffman codes, how they were generated, and how you could (de)code information with it. I also said that they were optimal under various conditions, one of which (that I may or may not have mentioned) is that you have an integer number of bits.

Coding with an non-integer number of bits is counter-intuitive, but it is entirely possible to do so. There are in fact many ways to do so, but let’s start easy and ignore the frequency of occurrence of symbols for now.

Read the rest of this entry »


Compression 101

25/10/2011

In this blog, we have discussed data compression more than once, but not really about its philosophy. Data compression is seen as an “enabling technology”—a technology helping make things happen—but what makes it so interesting is that it ultimately gives us insights into the very nature of data, of information itself.

Data compression is not, however, simply about “dropping redundant bits” (although it’s a great way of putting it in order to dismiss the matter with non-technical people), because bits very rarely just stand there being very conspicuously redundant1. No, data compression is about transforming the data into a representation in which there is, after analysis, exploitable redundancy.

Recently, I gave a talk at the LISA, Université de Montréal, entitled Compression 101, introducing the main concepts of data compression, stressing on points I find particularly important.

Read the rest of this entry »


Coding Maps (Part I)

30/08/2011

Quite a while ago, I discussed map generation in the classic 80s-era game Tunnels of Doom. After a short correspondence with Kevin Kenney himself (who kindly answered my questions; and I hope he is aware that he contributed to the fascination of great many kids in computer science), I manage to, not exactly reproduce his algorithm, but create a number of fun variations.

Tunnels of Doom Combat Screen.

This raises the question as to how do we encode maps efficiently in the computer’s memory, not only for Tunnels of Doooooom but also for any number of other games.

Read the rest of this entry »


Implicit Coding

05/07/2011

I’m always playing around with data compression ideas, and I recently came across a particular problem that inspired this week’s entry about implicit coding.

The idea of implicit coding is to exploit some constraint in the data that lets you reconstruct one (or more) value(s) unambiguously from the others. The textbook example of implicit coding is to use a constraint such as a known sum. For example, If you have a right triangle you necessarily have that

Read the rest of this entry »


Huffman Codes

17/05/2011

Some time ago, I presented a piece on compressing voxel worlds and I just realized that I discussed different types of variable length codes quite a few times, but that I never took the time to present you the basic method of Huffman coding!

The problem of finding an optimal variable length code is to find an uniquely decodable binary code (that is, a code using only 0 and 1 for which there exists only one way of decoding a message) that encodes each symbol in a number of bits that is proportional to its probability of occurrence. A first strategy was devised by Robert Fano, Huffman’s teacher at one point. The so-called Shannon-Fano code is a top-down approach to solving the problem but it was shown to be suboptimal. The code proposed by David Huffman solves the problem optimally by taking exactly the opposite approach, that is, building the code bottom-up.

Read the rest of this entry »


Jump!

10/05/2011

After five or six years working at the ÉTS, I was offered a position at the LISA (the Laboratoire d’Informatique des Systèmes Adaptatifs) at Université de Montréal… an offer I could not decline.

The LISA is run by Yoshua Bengio, my former Ph. D. adviser, a world-class researcher in Machine Learning, with other researchers, with many students.

Over the last few months, I gradually shifted my time from ÉTS to Université of Montréal, but I will be working full time at the LISA from now on.

Read the rest of this entry »


Compressing Voxel Worlds

22/03/2011

A friend of mine, Arthur, is developing a voxel-based game and found himself having to deal with large volumes of data. Unlike 2½D games where the map is essentially a 2D expanse with occasional relief, his game allows the definition of things in full (discrete) 3D.

To avoid loading the entire map in memory, he made the wise design decision of making his world tile-based, so that it can, simultaneously, reduce the working set as well as having an essentially open map, by loading only a small set of visible world blocks at all time. So together we took a look at compressing these world blocks, and it turned out that we can do a lot with fairly simple algorithms and VLCs.

Read the rest of this entry »


iPod Touch Movies?

11/01/2011

I got myself one of those “retina display” (960×640) 5th gen iPod Touch for the New Year. First impressions are that it’s rather well integrated, responsive, and has a number of fun applications. It’s even usable as an X/SSH thin client with a 10$ App.

With Permissions of Emily Carroll

But then you try to see how far you can push the use of the device with Linux (my primary operating system) and find that the support is dismal. The support is already not that impressive with Apple‘s iTunes running on Mac OS X. iTunes is really slow even when running native (i.e., without virtualization) and also does some very stupid things such as preventing you from copying a PDF you downloaded from the web back on the computer even though it has no DRMs—a behavior defective by design. Another limitation is in the iPod itself: the video formats supported are very limited.

Read the rest of this entry »


Picking a Bit-Rate for MP3 Files?

06/04/2010

I am currently contemplating the possibility of recoding all my CDs into new compressed formats. I currently use MP3 but since I started my collection in a time when a huge hard drive was 4GB, a lot of those are coded at 96 and 112 kilobits/s… which doesn’t sound all that great. Although MP3 isn’t the greatest format around, it does offer the advantage of being compatible with nearly all players around, something neither FLAC nor Ogg Vorbis can claim.

Some of the most recent devices support the addition of codecs, but not all. My Sansa player doesn’t, although I could probably upgrade the firmware or something but my car built-in player is another story. So for the time being, I considered recoding everything in MP3 with a higher bit-rate, or maybe use FLAC and write some kind of script that transcode the files to MP3 for exporting to a player. I’m not decided yet. But let say I decide to stick with MP3 all the way. What bit-rate should I use?

Read the rest of this entry »