Python Iterators

November 29, 2011

An iterator is an essential mechanism of data structure abstraction. An iterator will allow you to walk a data structure, giving you a pointer (or a reference) to the object in the collection corresponding to the current value of the iterator, while hiding as much as possible the implementation details of the underlying data structure. In C++’s Standard Template Library, for example, all collections provide iterators (they don’t exactly all provide the same interface, but that’s another story).

Python also defines something similar to iterators, that is: iterables. But there’s more than one way of getting this done, depending on what exactly we want iterators to do.

Read the rest of this entry »


Hash Functions (checksums, part II?)

November 22, 2011

On a number of different occasions, I briefly discussed Hash Functions, saying that if a hash function needn’t be very strong if the intend is to use it for look-up, contrary to cryptographic applications. But, unsurprisingly, it’s rather difficult to get a good fast hash function.

Coming up with a very good hash function isn’t easy, but we can at least make an effort to build one by understanding what makes a better hash function, especially by explicitly testing them. Let us have a try at building a good hash function (for look-up) from scratch.

Read the rest of this entry »


Introducing Theano & PyLearn

November 15, 2011

Today I am going to talk about my day job a bit. Contrary to previous jobs, a good part (but not all) of what I do now is either public domain or open-source. Two the projects I joined recently are Theano and PyLearn.

Theano is a mathematical expression compiler that maps expressions described in Python to machine-efficient code, either targeting the CPU or the GPU. PyLearn is a work in progress that aims to provide a comprehensive machine-learning framework for Theano.

Read the rest of this entry »


Mild Obfuscation

November 8, 2011

Sometimes, you have a small bit of data, may something like a GUID (for which there are many possible solutions), that you may have to store in a plain-text file, nothing crucial, not sensitive, but that you don’t really want your users to poke with, even if they really mean to. In such cases, you could use encryption, but it may be that mild obfuscation is quite sufficient and dissuasive.

So, if you don’t really want strong encryption, what can you do to provide a machine-efficient encryptionnette?

Read the rest of this entry »


Fractional Bits (Part I)

November 1, 2011

Some time ago, I discussed Huffman codes, how they were generated, and how you could (de)code information with it. I also said that they were optimal under various conditions, one of which (that I may or may not have mentioned) is that you have an integer number of bits.

Coding with an non-integer number of bits is counter-intuitive, but it is entirely possible to do so. There are in fact many ways to do so, but let’s start easy and ignore the frequency of occurrence of symbols for now.

Read the rest of this entry »