17/01/2012
Once upon a time, I discussed how to pick bit-rate for MP3, while considering re-ripping all my CDs. But if I’m to re-rip everything, I might as well do it one last time and use lossless compression.

In this post, we’ll discuss the simple script I cooked up to do just that, and a bit on how Flac works, and how it compares to MP3.
Read the rest of this entry »
1 Comment |
algorithms, Bash (Shell), data compression, embedded programming, Mathematics, programming | Tagged: CD, Entropy, Golomb Codes, GSM, Linear Prediction, music, psychoacoustic model, psychoacoustics, sound, Speech, Speech Codec |
Permalink
Posted by Steven Pigeon
10/01/2012
In the previous post of this series, we left off where we were asking ourselves if there was a better way than the selection algorithm of finding the median.

Computing the median of three numbers is a simple as sorting the three numbers (an operation that can be done in constant time, after all, if comparing and swapping are constant time) and picking the middle. However, if the objects compared are “heavy”, comparing and (especially) moving them around may be expensive.
Read the rest of this entry »
4 Comments |
algorithms, C, C-plus-plus, data structures, programming | Tagged: ADL, cmpxchg, median, Sorting Networks |
Permalink
Posted by Steven Pigeon
03/01/2012
The usual way of forming a search tree from a list is to scan the list and insert each of its element, one by one, into the tree, leading to a(n expected) run-time of
.

However, if the list is sorted (in ascending order, say) and the tree is not one of the self-balancing varieties, insertion is
, because the “tree” created by the successive insertions of sorted key is in fact a degenerate tree, a list. So, what if the list is already sorted and don’t really want to have a self-balancing tree? Well, it turns out that you can build a(n almost perfectly) balanced tree in
.
Read the rest of this entry »
6 Comments |
algorithms, data compression, data structures, programming | Tagged: Compact Tree, Compact Tree Storage, Huffman, Huffman Codes, phase-in codes, Search Tree, Segregated Storage, Tree |
Permalink
Posted by Steven Pigeon
27/12/2011
In a previous installment, about filtering noise, we discussed how to use a moving average to even out irregularities in a sequence. Averaging over a window is tricky. First, the window size must make sense in terms of the speed at which the signal changes (in samples), and the average is (overly) sensitive to outliers.

One way to limit the influence of the outliers for denoising is to use the median. However, computing the median is usually more tricky than computing the average, or mean, and this first post (in a series of three, in the next few weeks), discusses how to compute the median efficiently using the selection algorithm.
Read the rest of this entry »
4 Comments |
algorithms, C, C-plus-plus, data structures, programming, theoretical computer science | Tagged: Lomuto, median, QuickSort, selection, Wirth |
Permalink
Posted by Steven Pigeon
22/11/2011
On a number of different occasions, I briefly discussed Hash Functions, saying that if a hash function needn’t be very strong if the intend is to use it for look-up, contrary to cryptographic applications. But, unsurprisingly, it’s rather difficult to get a good fast hash function.

Coming up with a very good hash function isn’t easy, but we can at least make an effort to build one by understanding what makes a better hash function, especially by explicitly testing them. Let us have a try at building a good hash function (for look-up) from scratch.
Read the rest of this entry »
Leave a Comment » |
algorithms, bit twiddling, C-plus-plus, data structures, Mathematics, programming | Tagged: hash, hash function, Hash Map, hashes for look-up, hashing, secure hashes |
Permalink
Posted by Steven Pigeon
15/11/2011
Today I am going to talk about my day job a bit. Contrary to previous jobs, a good part (but not all) of what I do now is either public domain or open-source. Two the projects I joined recently are Theano and PyLearn.

Theano is a mathematical expression compiler that maps expressions described in Python to machine-efficient code, either targeting the CPU or the GPU. PyLearn is a work in progress that aims to provide a comprehensive machine-learning framework for Theano.
Read the rest of this entry »
1 Comment |
machine learning, Mathematics, programming, Python, Science | Tagged: ATLAS, CUDA, optimization, PyLearn, Theano |
Permalink
Posted by Steven Pigeon
08/11/2011
Sometimes, you have a small bit of data, may something like a GUID (for which there are many possible solutions), that you may have to store in a plain-text file, nothing crucial, not sensitive, but that you don’t really want your users to poke with, even if they really mean to. In such cases, you could use encryption, but it may be that mild obfuscation is quite sufficient and dissuasive.

So, if you don’t really want strong encryption, what can you do to provide a machine-efficient encryptionnette?
Read the rest of this entry »
4 Comments |
algorithms, bit twiddling, C, C-plus-plus, hacks, Mathematics, programming, Python | Tagged: bit blenders, bit twiddling, Encryption, MAC address, Modular Arithmetic, xor |
Permalink
Posted by Steven Pigeon
11/10/2011
One thing that came on topic rather often recently, and especially in connection with Python, was data structure complexity and what kind of performance operations should offer. However, it’s not as simple as it sounds. First, algorithm operations to be performed dictate, to a great extent, what container, or abstract data structure, should be favored.

But sometimes, these data structures lend themselves to alternate implementations that have different run-time and memory complexities. Maybe we should have a look.
Read the rest of this entry »
Leave a Comment » |
algorithms, C-plus-plus, data structures, programming, Python | Tagged: hash, Hash Map, list, Priority Queue, Queue, Tree, Vector |
Permalink
Posted by Steven Pigeon
04/10/2011
The problem of computing luminance was rather of a bit-twiddling nature (and some of my readers came up with very creative solutions—far better than my own), and the problem of the Martian calendar was a bit more deductive (and still not solved, though some of the readers came close to a solution); and for the third programming challenge, I propose something a bit more algorithmic/basic data structure in tone.

The problem I propose in this challenge arises in a variety of settings, such in (simple) search engines, but performing it efficiently is not always trivial.
Read the rest of this entry »
6 Comments |
algorithms, data structures, programming | Tagged: Mathematica, Matlab, Search Engines |
Permalink
Posted by Steven Pigeon
27/09/2011
Sometimes you have to encode reversibly two (or more) values onto a single one. The typical example of a pairing function that encodes two non-negative integers onto a single non-negative integer (therefore a function
) is the Cantor function, instrumental to the demonstration that, for example, the rational can be mapped onto the integers.

In a more pragmatic way, it may be necessary to encode two values onto one for data compression purposes, or to otherwise exploit a protocol that does not allow many distinct values to be sent separately. The simplest way of pairing two integer is to interleave their bits, for example, with something like:
Read the rest of this entry »
8 Comments |
algorithms, bit twiddling, C, hacks, Mathematics, programming, Python | Tagged: Gödel, Gödel Number, Gödel Numbering, Integers, pairing function |
Permalink
Posted by Steven Pigeon