Float16

February 12, 2013

The possible strategies for data compression fall into two main categories: lossless and lossy compression. Lossless compression means that you retrieve exactly what went in after compression, while lossy means that some information was destroyed to get better compression, meaning that you do not retrieve the original data, but only a reasonable reconstruction (for various definitions of “reasonable”).

Destroying information is usually performed using transforms and quantization. Transforms map the original data onto a space were the unimportant variations are easily identified, and on which quantization can be applied without affecting the original signal too much. For quantization, the first approach is to simply reduce precision, somehow “rounding” the values onto a smaller set of admissible values. For decimal numbers, this operation is rounding (or truncating) to the nth digit (with n smaller than the original precision). A much better approach is to minimize an explicit error function, choosing the smaller set of values in a way that minimizes the expected error (or maximum error, depending on how you formulate your problem).

Read the rest of this entry »


Flac vs Lame

February 21, 2012

In a previous post I discussed lossless audio encoding and presented a Bash script using flac to rip and process CD audio files. I also commented on how the psycho-acoustic model used in a MP3 encoder will dominate encoding as bit-rate increases, without much quantitative evidence. In this post, I present some.

Read the rest of this entry »


Looking At Flac Compression Ratios

February 7, 2012

In a previous post, I told you about a short script to rip and encode CDs using Flac, and I discussed a bit about how LPC works. In this post, let us have a look on how efficient Flac is.

Let us use a quantitative approach to this. Since I have a great number of songs, we can use statistics to give us a good idea of what kind of compression we can expect.

Read the rest of this entry »


(Random Musings) On Floats and Encodings

January 31, 2012

The float and double floating-point data types have been present for a long time in the C (and C++) standard. While neither the C nor C++ standards do not enforce it, virtually all implementations comply to the IEEE 754—or try very hard to. In fact, I do not know as of today of an implementation that uses something very different. But the IEEE 754-type floats are aging. GPU started to add extensions such as short floats for evident reasons. Should we start considering adding new types on both ends of the spectrum?

The next step up, the quadruple precision float, is already part of the standard, but, as far as I know, not implemented anywhere. Intel x86 does have something in between for its internal float format on 80 bits, the so-called extended precision, but it’s not really standard as it is not sanctioned by the IEEE standards, and, generally speaking, and surprisingly enough, not really supported well by the instruction set. It’s sometimes supported by the long double C type. But, anyway, what’s in a floating point number?

Read the rest of this entry »


Lossless Coding of CD Audio

January 17, 2012

Once upon a time, I discussed how to pick bit-rate for MP3, while considering re-ripping all my CDs. But if I’m to re-rip everything, I might as well do it one last time and use lossless compression.

In this post, we’ll discuss the simple script I cooked up to do just that, and a bit on how Flac works, and how it compares to MP3.

Read the rest of this entry »


Building a Balanced Tree From a List in Linear Time

January 3, 2012

The usual way of forming a search tree from a list is to scan the list and insert each of its element, one by one, into the tree, leading to a(n expected) run-time of O(n \lg n).

However, if the list is sorted (in ascending order, say) and the tree is not one of the self-balancing varieties, insertion is O(n^2), because the “tree” created by the successive insertions of sorted key is in fact a degenerate tree, a list. So, what if the list is already sorted and don’t really want to have a self-balancing tree? Well, it turns out that you can build a(n almost perfectly) balanced tree in O(n).

Read the rest of this entry »


Fractional Bits (Part I)

November 1, 2011

Some time ago, I discussed Huffman codes, how they were generated, and how you could (de)code information with it. I also said that they were optimal under various conditions, one of which (that I may or may not have mentioned) is that you have an integer number of bits.

Coding with an non-integer number of bits is counter-intuitive, but it is entirely possible to do so. There are in fact many ways to do so, but let’s start easy and ignore the frequency of occurrence of symbols for now.

Read the rest of this entry »


Compression 101

October 25, 2011

In this blog, we have discussed data compression more than once, but not really about its philosophy. Data compression is seen as an “enabling technology”—a technology helping make things happen—but what makes it so interesting is that it ultimately gives us insights into the very nature of data, of information itself.

Data compression is not, however, simply about “dropping redundant bits” (although it’s a great way of putting it in order to dismiss the matter with non-technical people), because bits very rarely just stand there being very conspicuously redundant1. No, data compression is about transforming the data into a representation in which there is, after analysis, exploitable redundancy.

Recently, I gave a talk at the LISA, Université de Montréal, entitled Compression 101, introducing the main concepts of data compression, stressing on points I find particularly important.

Read the rest of this entry »


Coding Maps (Part I)

August 30, 2011

Quite a while ago, I discussed map generation in the classic 80s-era game Tunnels of Doom. After a short correspondence with Kevin Kenney himself (who kindly answered my questions; and I hope he is aware that he contributed to the fascination of great many kids in computer science), I manage to, not exactly reproduce his algorithm, but create a number of fun variations.

Tunnels of Doom Combat Screen.

This raises the question as to how do we encode maps efficiently in the computer’s memory, not only for Tunnels of Doooooom but also for any number of other games.

Read the rest of this entry »


Implicit Coding

July 5, 2011

I’m always playing around with data compression ideas, and I recently came across a particular problem that inspired this week’s entry about implicit coding.

The idea of implicit coding is to exploit some constraint in the data that lets you reconstruct one (or more) value(s) unambiguously from the others. The textbook example of implicit coding is to use a constraint such as a known sum. For example, If you have a right triangle you necessarily have that

Read the rest of this entry »


Follow

Get every new post delivered to your Inbox.

Join 41 other followers