Introducing Theano & PyLearn

15/11/2011

Today I am going to talk about my day job a bit. Contrary to previous jobs, a good part (but not all) of what I do now is either public domain or open-source. Two the projects I joined recently are Theano and PyLearn.

Theano is a mathematical expression compiler that maps expressions described in Python to machine-efficient code, either targeting the CPU or the GPU. PyLearn is a work in progress that aims to provide a comprehensive machine-learning framework for Theano.

Read the rest of this entry »


Mild Obfuscation

08/11/2011

Sometimes, you have a small bit of data, may something like a GUID (for which there are many possible solutions), that you may have to store in a plain-text file, nothing crucial, not sensitive, but that you don’t really want your users to poke with, even if they really mean to. In such cases, you could use encryption, but it may be that mild obfuscation is quite sufficient and dissuasive.

So, if you don’t really want strong encryption, what can you do to provide a machine-efficient encryptionnette?

Read the rest of this entry »


Fractional Bits (Part I)

01/11/2011

Some time ago, I discussed Huffman codes, how they were generated, and how you could (de)code information with it. I also said that they were optimal under various conditions, one of which (that I may or may not have mentioned) is that you have an integer number of bits.

Coding with an non-integer number of bits is counter-intuitive, but it is entirely possible to do so. There are in fact many ways to do so, but let’s start easy and ignore the frequency of occurrence of symbols for now.

Read the rest of this entry »


Compression 101

25/10/2011

In this blog, we have discussed data compression more than once, but not really about its philosophy. Data compression is seen as an “enabling technology”—a technology helping make things happen—but what makes it so interesting is that it ultimately gives us insights into the very nature of data, of information itself.

Data compression is not, however, simply about “dropping redundant bits” (although it’s a great way of putting it in order to dismiss the matter with non-technical people), because bits very rarely just stand there being very conspicuously redundant1. No, data compression is about transforming the data into a representation in which there is, after analysis, exploitable redundancy.

Recently, I gave a talk at the LISA, Université de Montréal, entitled Compression 101, introducing the main concepts of data compression, stressing on points I find particularly important.

Read the rest of this entry »


On Hockey Pools

18/10/2011

If you have ever played in Hockey pools (or any other kind of pools) you know that if you do not get a good drawing rank, your chances of winning anything are greatly diminished. So, here’s how a typical pool works. There are n pool players that will form “teams” with k league players (usually real players from the real leagues, with their standardized scores) from a total of m league players.

To form the n teams, the n players put their numbers 1,2,\ldots,n in a hat, and the numbers are drawn one by one, determining in which order, in each round, pool players will get to choose their next pick in the remaining league players. That is, if the order drawn is, say, 5, 3, 4, 2, 1, then pool player number 5 gets to choose first, picking one league player, then goes pool player 3, and so on, until all pool players picked a league player, thus completing one round. There are k of those rounds so that each pool player has his team of k league players.

Read the rest of this entry »


The Complexity of Containers

11/10/2011

One thing that came on topic rather often recently, and especially in connection with Python, was data structure complexity and what kind of performance operations should offer. However, it’s not as simple as it sounds. First, algorithm operations to be performed dictate, to a great extent, what container, or abstract data structure, should be favored.

But sometimes, these data structures lend themselves to alternate implementations that have different run-time and memory complexities. Maybe we should have a look.

Read the rest of this entry »


Programming Challenge: List Intersection

04/10/2011

The problem of computing luminance was rather of a bit-twiddling nature (and some of my readers came up with very creative solutions—far better than my own), and the problem of the Martian calendar was a bit more deductive (and still not solved, though some of the readers came close to a solution); and for the third programming challenge, I propose something a bit more algorithmic/basic data structure in tone.

The problem I propose in this challenge arises in a variety of settings, such in (simple) search engines, but performing it efficiently is not always trivial.

Read the rest of this entry »


Pairing Functions

27/09/2011

Sometimes you have to encode reversibly two (or more) values onto a single one. The typical example of a pairing function that encodes two non-negative integers onto a single non-negative integer (therefore a function f:\mathbb{Z}^*\times\mathbb{Z}^*\to\mathbb{Z}^*) is the Cantor function, instrumental to the demonstration that, for example, the rational can be mapped onto the integers.

In a more pragmatic way, it may be necessary to encode two values onto one for data compression purposes, or to otherwise exploit a protocol that does not allow many distinct values to be sent separately. The simplest way of pairing two integer is to interleave their bits, for example, with something like:

Read the rest of this entry »


Wallpaper: Karlův Most, 5h05

22/09/2011

(Karlův Most, 5h05, 1920×1200)

You can find more wallpapers here


Wallpaper: Mes ennemis

22/09/2011

(Mes ennemis, 1920×1200)

You can find more wallpapers here