Learning Python (II)

One thing I really like about Python is how it makes manipulating lists quite easily, especially via slices and list comprehensions. List comprehensions, especially in generator notation form, are an easy way to create lists with specific content. Other functional programming languages (such as Mathematica, amongst others) have taught me to use “map” to transform lists. But is it always the clearest and fastest way?

The basic list comprehension to generate a list of integers looks like:

[i for i in xrange(1000)]

where xrange(n) is an iterator that does not create a list as would range(n). (The range(n) expression creates an explicit list [0,1,...,n] and therefore using potentially tons of memory, while the xrange(n) merely create an “iterable” (something of a pair composed of an iterator and an iterator limit).) So the above creates a list. Let’s make it large:

fat_list=[i for i in xrange(1000000)]

Let us say that, later on, we need to apply an operation on it, for example negate its elements. The most pythonèsque way to this is:

a_list=[-x for x in fat_list]

Note that fat_list might have been modified at some point in the past, but it still has 1000000 elements. Hmm, so what if we used map? With a lambda, it becomes:

b_list=map(lambda x:-x,fat_list)

or maybe an external function?

def invert(x): return -x

or a call to invert from a list comprehension?

d_list=[ invert(x) for x in fat_list]

Which one is the most legible? Probably the [-x for x in fat_list] version. But which is the fastest? Let us run each test 100 times to drown-out OS noise. With Python 2.7.1, on my AMD box, we get:

“Create” represents the time needed to just create the list, in seconds. “Negate” is the simple list comprehension version. “Lambda” is the map version using a lambda. “Map+Call” is the map calling the invert function. Lastly, “[Call]” is the version of the list comprehension that calls invert.

If we factor out list creation time, we get:

The simple list comprehension is much faster than the other variants, without dispute. The lambda version and the map calling invert are much slower, but compared to each other, with the calling version just marginally slower. But the big surprise comes from the list comprehension calling invert: it is much slower.

I am not sure if the results are following my intuition or go against it. The simplest expression is the fasted, as expected, but in some measure, you’d expect the map and list comprehension method calling the function would map to essentially the same interpreter code, but for some reason, they do not. The list comprehension calling a function is approximately 25% slower.

* *

I would like to say that the morale of the story is that you should always prefer legibility over execution speed, but my experience tells me that it’s a delicate trade-off. In this quite simple experiment, the naïve list comprehension beats the other methods flat; but it is also clear that if you’re to apply a (non-lambda) function to a list, map is better than a list comprehension (and for some reason, I’d rather have this function called apply than map).

4 Responses to Learning Python (II)

  1. Evan Jones says:

    This goes to your point that you need to understand how things are implemented to understand the performance choices:

    The reason the list comprehension with the named function call is more expensive is because it must perform Python function lookup on each iteration. After all, your call to invert() might dynamically change what invert() refers to. map+call doesn’t have this overhead because it looks up the name “invert” only once, then calls that function object many times. See the example program I pasted at the bottom for how this can change the semantics.

    I suspect the difference between the plain comprehension, the lambda, and the map+call versions are function call overhead? I haven’t looked at the compiled Python bytecode, but my recollection is that list comprehensions are executed in the function where they appear, whereas the lambda and map+call versions need to call and return from a function many times. I don’t know if there should be any difference between the lambda and map+call.

    Example of the reason name lookups might matter:

    l = [1, 2, 3, 4]

    def negate(x):
    global transform
    transform = doNothing
    return -x

    def doNothing(x):
    global transform
    transform = negate
    return x

    transform = negate

    print “comprehension:”, [transform(x) for x in l]
    print “map:”, map(transform, l)

    • Steven Pigeon says:

      I think Python has to look up the function from its name everytime, but the difference between map(invert,z) and [invert(x) for x in z] should be negligible. At least, that’s what I expected.

      There might be a small overhead in [invert(x) for x in z] because the for x part supports a conditional (you can write for example [x for x in z if odd(x)], but the observed difference seems large.

  2. Evan Jones says:

    That is actually not true: the map(invert, z) looks up the name “invert” once, then passes in the function object to map, which uses it in its inner loop. The list comprehension version looks up the name "invert" every time, in its inner loop. See the example I posted for how this can change semantics: If I rebind invert to something else, the list comprehension will "see" the change, while the map version does not.

  3. Evan Jones says:

    Whoops, hit reply too soon. I guess what this indicates is that looking up variable in Python modules is slower than local variables.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: