January 8, 2013
Last week we had a look at how much memory basic Python objects use. This week, we will discuss how Python manages its memory internally, and why it goes wrong if you’re not careful.

To speed-up memory allocation (and reuse) Python uses a number of lists for small objects. Each list will contain objects of similar size: there will be a list for objects 1 to 8 bytes in size, one for 9 to 16, etc. When a small object needs to be created, either we reuse a free block in the list, or we allocate a new one.
Read the rest of this entry »
4 Comments |
data structures, programming, Python | Tagged: memory, memory allocation, memory usage |
Permalink
Posted by Steven Pigeon
January 1, 2013
[This is a piece I initially wrote while at the LISA at U de M, for the newbie coders in the lab.]
One of the major challenges in writing (somewhat) large-scale Python programs, is to keep memory usage at a minimum. However, managing memory in Python is easy—if you just don’t care. Python allocates memory transparently, manages objects using a reference count system, and frees memory when an object’s reference count falls to zero. In theory, it’s swell. In practice, you need to know a few things about Python memory management to get a memory-efficient program running. One of the things you should know, or at least get a good feel about, is the sizes of basic Python objects. Another thing is how Python manages its memory internally.

So let us begin with the size of basic objects. In Python, there’s not a lot of primitive data types: there are ints, longs (an unlimited precision version of int), floats (which are doubles), tuples, strings, lists, dictionaries, and classes.
Read the rest of this entry »
1 Comment |
data structures, Python | Tagged: memory, memory allocation, memory usage |
Permalink
Posted by Steven Pigeon
March 13, 2012
As part of an open-source project I’m working on (right now, we are still at the technical feasibility stage where we explore and eliminate technical risks, full disclosure will come later) we have to issue session numbers. They’re not session numbers in the usual sense, but they still need to be unique, and not amenable to simple attacks.

There are a couple of ways of generating unique session numbers. RFC 4122-compliant unique IDs is one possible way.
Read the rest of this entry »
2 Comments |
algorithms, bit twiddling, hacks, programming, Python | Tagged: RFC 4122, session, session id, SSL, urandom, UUID |
Permalink
Posted by Steven Pigeon
November 29, 2011
An iterator is an essential mechanism of data structure abstraction. An iterator will allow you to walk a data structure, giving you a pointer (or a reference) to the object in the collection corresponding to the current value of the iterator, while hiding as much as possible the implementation details of the underlying data structure. In C++’s Standard Template Library, for example, all collections provide iterators (they don’t exactly all provide the same interface, but that’s another story).

Python also defines something similar to iterators, that is: iterables. But there’s more than one way of getting this done, depending on what exactly we want iterators to do.
Read the rest of this entry »
Leave a Comment » |
C, hacks, Python | Tagged: Generators, Iterators, stl, Triangular Numbers |
Permalink
Posted by Steven Pigeon
November 15, 2011
Today I am going to talk about my day job a bit. Contrary to previous jobs, a good part (but not all) of what I do now is either public domain or open-source. Two the projects I joined recently are Theano and PyLearn.

Theano is a mathematical expression compiler that maps expressions described in Python to machine-efficient code, either targeting the CPU or the GPU. PyLearn is a work in progress that aims to provide a comprehensive machine-learning framework for Theano.
Read the rest of this entry »
Leave a Comment » |
machine learning, Mathematics, programming, Python, Science | Tagged: ATLAS, CUDA, optimization, PyLearn, Theano |
Permalink
Posted by Steven Pigeon
November 8, 2011
Sometimes, you have a small bit of data, may something like a GUID (for which there are many possible solutions), that you may have to store in a plain-text file, nothing crucial, not sensitive, but that you don’t really want your users to poke with, even if they really mean to. In such cases, you could use encryption, but it may be that mild obfuscation is quite sufficient and dissuasive.

So, if you don’t really want strong encryption, what can you do to provide a machine-efficient encryptionnette?
Read the rest of this entry »
3 Comments |
algorithms, bit twiddling, C, C-plus-plus, hacks, Mathematics, programming, Python | Tagged: bit blenders, bit twiddling, Encryption, MAC address, Modular Arithmetic, xor |
Permalink
Posted by Steven Pigeon
October 11, 2011
One thing that came on topic rather often recently, and especially in connection with Python, was data structure complexity and what kind of performance operations should offer. However, it’s not as simple as it sounds. First, algorithm operations to be performed dictate, to a great extent, what container, or abstract data structure, should be favored.

But sometimes, these data structures lend themselves to alternate implementations that have different run-time and memory complexities. Maybe we should have a look.
Read the rest of this entry »
Leave a Comment » |
algorithms, C-plus-plus, data structures, programming, Python | Tagged: hash, Hash Map, list, Priority Queue, Queue, Tree, Vector |
Permalink
Posted by Steven Pigeon
September 27, 2011
Sometimes you have to encode reversibly two (or more) values onto a single one. The typical example of a pairing function that encodes two non-negative integers onto a single non-negative integer (therefore a function
) is the Cantor function, instrumental to the demonstration that, for example, the rational can be mapped onto the integers.

In a more pragmatic way, it may be necessary to encode two values onto one for data compression purposes, or to otherwise exploit a protocol that does not allow many distinct values to be sent separately. The simplest way of pairing two integer is to interleave their bits, for example, with something like:
Read the rest of this entry »
7 Comments |
algorithms, bit twiddling, C, hacks, Mathematics, programming, Python | Tagged: Gödel, Gödel Number, Gödel Numbering, Integers, pairing function |
Permalink
Posted by Steven Pigeon
September 20, 2011
One thing I really like about Python is how it makes manipulating lists quite easily, especially via slices and list comprehensions. List comprehensions, especially in generator notation form, are an easy way to create lists with specific content. Other functional programming languages (such as Mathematica, amongst others) have taught me to use “map” to transform lists. But is it always the clearest and fastest way?

Read the rest of this entry »
4 Comments |
hacks, programming, Python | Tagged: List comprehensions, lists, Python 2.7.1 |
Permalink
Posted by Steven Pigeon
September 13, 2011
I still can’t figure out exactly which operations are expensive in Python. My C/C++ can’t help me much because it seems that things aren’t implemented like I’d've expected—like lists that aren’t lists, but array lists, leading to
for operations you would otherwise expect to be
.

But a friend of mine—Olivier—showed me a simple, basic, yet rather effective tool to profile Python programs (I’m not sure if I should say script or not).
Read the rest of this entry »
6 Comments |
Bash (Shell), Life in the workplace, programming, Python | Tagged: easy_install, Kcachegrind, optimization, Profiling, SquareMap, Trace, Valgrind |
Permalink
Posted by Steven Pigeon