Python Iterators

An iterator is an essential mechanism of data structure abstraction. An iterator will allow you to walk a data structure, giving you a pointer (or a reference) to the object in the collection corresponding to the current value of the iterator, while hiding as much as possible the implementation details of the underlying data structure. In C++’s Standard Template Library, for example, all collections provide iterators (they don’t exactly all provide the same interface, but that’s another story).

Python also defines something similar to iterators, that is: iterables. But there’s more than one way of getting this done, depending on what exactly we want iterators to do.

The first option is to build a class that offers a method __iter__, returning self, that announces the class as iterable. Each new call returns the next item, and to do so, the next method is called. The last item is detected when next raises StopIteration. For example, it would look pretty much as:

class class_with_iterator:

    def __init__(self, first, last):
        """Creates the class to iterate
           between first and last (inclusive)"""
        self.current = first-1
        self.last = last

    def __iter__(self):
        return self

    def next(self):
        if self.current >= self.last:
            raise StopIteration
        else:
            self.current += 1
            return self.current

The basic skeleton is taken from here. That method is OK if the class itself is a generator of some sort, but as it changes the instance, the method can’t really be used to provide multiple independent iterators on a object.

One solution is to use coroutines, functions that you can call but retain some of its context between calls. The C++ equivalent would be something like a functor, or function-class, or function object. In Python, it’s done by something like:

class multiple_counter:

    def iter(self,first,last):
        current=first
        while current <= last:
            yield current
            current += 1

This method allows multiple and independent calls to iter. For example,

z=multiple_counter()

for x in z.iter(3,10):
    print [ y for y in z.iter(5,8)]

…will produce the expected result.

So this works around the yield statement. It’s basically a return statement that retains the state of the coroutine. When the same instance of the coroutine is called again, execution will resume just after the yield statement. The iterations here stops when the coroutine just exits without yielding.

But coroutines do not even need to be in a class. They’re useful in a class if they abstract something complex that has to do with the internal state of the class, for example, successively visiting the nodes of a tree. But if the only thing you need is a generator, you can move it out of a class and make a standalone generator/iterator:

def triangular(first,last):
    for x in xrange(first,last+1):
        yield x*(x+1)/2

print [ z for z in triangular(3,9) ]

…would print a range of triangular numbers, here:

[6, 10, 15, 21, 28, 36, 45]

(You may have noticed that I replaced the while by an xrange, which is an iterable also!)

*
* *

I know I’ve said this before in another post, but the hardest part of learning a new programming language is to start thinking in the paradigm offered by the language rather than in the paradigms of the languages you already know. In this particular case, coroutines, isn’t exactly new to me. In a previous life, I used Dynamic C, a not-quite C variant that offers a number of different mechanisms for multitasking, both cooperative and preemptive; including coroutines. Too bad the concept isn’t meaningful in (standard) C.

Leave a comment