Python Memory Management (Part II)

January 8, 2013

Last week we had a look at how much memory basic Python objects use. This week, we will discuss how Python manages its memory internally, and why it goes wrong if you’re not careful.

To speed-up memory allocation (and reuse) Python uses a number of lists for small objects. Each list will contain objects of similar size: there will be a list for objects 1 to 8 bytes in size, one for 9 to 16, etc. When a small object needs to be created, either we reuse a free block in the list, or we allocate a new one.

Read the rest of this entry »


Python Memory Management (Part I)

January 1, 2013

[This is a piece I initially wrote while at the LISA at U de M, for the newbie coders in the lab.]

One of the major challenges in writing (somewhat) large-scale Python programs, is to keep memory usage at a minimum. However, managing memory in Python is easy—if you just don’t care. Python allocates memory transparently, manages objects using a reference count system, and frees memory when an object’s reference count falls to zero. In theory, it’s swell. In practice, you need to know a few things about Python memory management to get a memory-efficient program running. One of the things you should know, or at least get a good feel about, is the sizes of basic Python objects. Another thing is how Python manages its memory internally.

So let us begin with the size of basic objects. In Python, there’s not a lot of primitive data types: there are ints, longs (an unlimited precision version of int), floats (which are doubles), tuples, strings, lists, dictionaries, and classes.

Read the rest of this entry »


UEID (Unique Enough IDs, part 2)

March 13, 2012

As part of an open-source project I’m working on (right now, we are still at the technical feasibility stage where we explore and eliminate technical risks, full disclosure will come later) we have to issue session numbers. They’re not session numbers in the usual sense, but they still need to be unique, and not amenable to simple attacks.

There are a couple of ways of generating unique session numbers. RFC 4122-compliant unique IDs is one possible way.

Read the rest of this entry »


Python Iterators

November 29, 2011

An iterator is an essential mechanism of data structure abstraction. An iterator will allow you to walk a data structure, giving you a pointer (or a reference) to the object in the collection corresponding to the current value of the iterator, while hiding as much as possible the implementation details of the underlying data structure. In C++’s Standard Template Library, for example, all collections provide iterators (they don’t exactly all provide the same interface, but that’s another story).

Python also defines something similar to iterators, that is: iterables. But there’s more than one way of getting this done, depending on what exactly we want iterators to do.

Read the rest of this entry »


Introducing Theano & PyLearn

November 15, 2011

Today I am going to talk about my day job a bit. Contrary to previous jobs, a good part (but not all) of what I do now is either public domain or open-source. Two the projects I joined recently are Theano and PyLearn.

Theano is a mathematical expression compiler that maps expressions described in Python to machine-efficient code, either targeting the CPU or the GPU. PyLearn is a work in progress that aims to provide a comprehensive machine-learning framework for Theano.

Read the rest of this entry »


Mild Obfuscation

November 8, 2011

Sometimes, you have a small bit of data, may something like a GUID (for which there are many possible solutions), that you may have to store in a plain-text file, nothing crucial, not sensitive, but that you don’t really want your users to poke with, even if they really mean to. In such cases, you could use encryption, but it may be that mild obfuscation is quite sufficient and dissuasive.

So, if you don’t really want strong encryption, what can you do to provide a machine-efficient encryptionnette?

Read the rest of this entry »


The Complexity of Containers

October 11, 2011

One thing that came on topic rather often recently, and especially in connection with Python, was data structure complexity and what kind of performance operations should offer. However, it’s not as simple as it sounds. First, algorithm operations to be performed dictate, to a great extent, what container, or abstract data structure, should be favored.

But sometimes, these data structures lend themselves to alternate implementations that have different run-time and memory complexities. Maybe we should have a look.

Read the rest of this entry »


Pairing Functions

September 27, 2011

Sometimes you have to encode reversibly two (or more) values onto a single one. The typical example of a pairing function that encodes two non-negative integers onto a single non-negative integer (therefore a function f:\mathbb{Z}^*\times\mathbb{Z}^*\to\mathbb{Z}^*) is the Cantor function, instrumental to the demonstration that, for example, the rational can be mapped onto the integers.

In a more pragmatic way, it may be necessary to encode two values onto one for data compression purposes, or to otherwise exploit a protocol that does not allow many distinct values to be sent separately. The simplest way of pairing two integer is to interleave their bits, for example, with something like:

Read the rest of this entry »


Learning Python (II)

September 20, 2011

One thing I really like about Python is how it makes manipulating lists quite easily, especially via slices and list comprehensions. List comprehensions, especially in generator notation form, are an easy way to create lists with specific content. Other functional programming languages (such as Mathematica, amongst others) have taught me to use “map” to transform lists. But is it always the clearest and fastest way?


Read the rest of this entry »


Run, Python, Run!

September 13, 2011

I still can’t figure out exactly which operations are expensive in Python. My C/C++ can’t help me much because it seems that things aren’t implemented like I’d've expected—like lists that aren’t lists, but array lists, leading to O(n) for operations you would otherwise expect to be O(1).

But a friend of mine—Olivier—showed me a simple, basic, yet rather effective tool to profile Python programs (I’m not sure if I should say script or not).

Read the rest of this entry »


Follow

Get every new post delivered to your Inbox.

Join 41 other followers