Python Memory Management (Part II)

Last week we had a look at how much memory basic Python objects use. This week, we will discuss how Python manages its memory internally, and why it goes wrong if you’re not careful.

To speed-up memory allocation (and reuse) Python uses a number of lists for small objects. Each list will contain objects of similar size: there will be a list for objects 1 to 8 bytes in size, one for 9 to 16, etc. When a small object needs to be created, either we reuse a free block in the list, or we allocate a new one.

There are some internal details on how Python manages those lists into blocks, pools, and “arena”: a number of block forms a pool, pools are gathered into arena, etc., but they’re not very relevant to the point we want to make (if you really want to know, read Evan Jones’ ideas on how to improve Python’s memory allocation). The important point is that those lists never shrink.

Indeed: if an item (of size x) is deallocated (freed by lack of reference) its location is not returned to Python’s global memory pool (and even less to the system), but merely marked as free and added to the free list of items of size x. The dead object’s location will be reused if another object of compatible size is needed. If there are no dead objects available, new ones are created.

If small objects memory is never freed, then the inescapable conclusion is that, like goldfishes, these small object lists only keep growing, never shrinking, and that the memory footprint of your application is dominated by the largest number of small objects allocated at any given point.

*
* *

Therefore, one should work hard to allocate only the number of small objects necessary for one task, favoring (otherwise not very pythonèsque) loops where only a small number of elements are created/processed rather than (more pythonèsque) patterns where lists are created using list generation syntax then processed.

While the second pattern is more à la Python, it is rather the worst case: you end up creating lots of small objects that will come populate the small object lists, and even once the list is dead, the dead objects (now all in the free lists) will still occupy a lot of memory.

*
* *

The fact that the free lists grow does not seem like much of a problem because the memory it contains is still accessible to the Python program. But from the OS’s perspective, your program’s size is the total (maximum) memory allocated to Python. Since Python returns memory to the OS on the heap (that allocates other objects than small objects) only on Windows, if you run on Linux, you can only see the total memory used by your program increase.

This is a problem if you have to run multiple instances of the same program on the same machine. Even if you have server-class machine, you may still encounter problems because suddenly 32GB of RAM isn't quite enough.

*
* *

Let us prove my point using memory_profiler, a Python add-on module (which depends on the python-psutil package) by Fabian Pedregosa (the module’s github page). This add-on provides the decorator @profile that allows one to monitor one specific function memory usage. It is extremely simple to use. Let us consider this small program (it makes my point entirely):

import copy, memory_profiler

@profile
def function():
    x=range(1000000) # allocate a big list
    y=copy.deepcopy(x)
    del x
    return y

if __name__=="__main__":
    function()

invoking

python -m memory_profiler memory-profile-me.py

prints

Filename: memory-profile-me.py

Line #    Mem usage    Increment   Line Contents
================================================
     3                             @profile
     4      9.11 MB      0.00 MB   def function():
     5     40.05 MB     30.94 MB       x=range(1000000) # allocate a big list
     6     89.73 MB     49.68 MB       y=copy.deepcopy(x)
     7     82.10 MB     -7.63 MB       del x
     8     82.10 MB      0.00 MB       return y

This small program creates a list with 1,000,000 ints (at 24 bytes each, for ~24 million bytes) plus a list of references (at 8 bytes each, for ~8 million bytes), for about 30MB. It then deep-copies the object (which allocates ~50MB, not sure why; a simple copy would allocate only 8MB of references; a deep copy should allocate about 30MB as well). Freeing x with del frees the reference list, kills the associated objects, but lo!, the amount of memory only goes down by the number of references, because the list itself is not in a small objects’ list, but on the heap, and the dead small objects remain in the free list, and not returned to the interpreter’s global heap.

In this example, we end up with twice the memory allocated, with 82MB, while only one list necessitating about 30MB is returned. You can see why it is easy to have memory just increase more or less surprisingly if we’re not careful.

*
* *

On a related note: is pickle wasteful?

Pickle is the standard way of (de)serializing Python objects to file. What is its memory footprint? Does it create extra copies of the data or is it rather smart about it?

Consider this short example:

import memory_profiler, random, pickle


def random_string():
    return "".join([ chr(64+random.randint(0,25)) for _ in xrange(20) ])


@profile
def create_file():
    x=[ (random.random(),
         random_string(),
         random.randint(0,2**64))
        for _ in xrange(1000000) ]

    pickle.dump(x,open('machin.pkl','w'))
         

@profile
def load_file():
    y=pickle.load(open('machin.pkl','r'))
    return y

if __name__=="__main__":
    create_file()
    #load_file()

With one invocation to profile the creation of the pickled data, and one invocation to re-read it (you comment out the function not to be called). Using memory_profiler, the creation uses a lot of memory:

Filename: test-pickle.py

Line #    Mem usage    Increment   Line Contents
================================================
     8                             @profile
     9      9.18 MB      0.00 MB   def create_file():
    10      9.33 MB      0.15 MB       x=[ (random.random(),
    11                                      random_string(),
    12                                      random.randint(0,2**64))
    13    246.11 MB    236.77 MB           for _ in xrange(1000000) ]
    14                             
    15    481.64 MB    235.54 MB       pickle.dump(x,open('machin.pkl','w'))

and re-reading a bit less:

Filename: test-pickle.py

Line #    Mem usage    Increment   Line Contents
================================================
    18                             @profile
    19      9.18 MB      0.00 MB   def load_file():
    20    311.02 MB    301.83 MB       y=pickle.load(open('machin.pkl','r'))
    21    311.02 MB      0.00 MB       return y

So somehow, pickling is very bad for memory consumption. The initial list takes up more or less 230MB, but pickling it creates an extra 230-something MB worth of memory allocation.

Unpickling, on the other hand, seems fairly efficient. It does create more memory than the original list (300MB instead of 230-something) but it does not double the quantity of allocated memory.

Overall, then, (un)pickling should be avoided for memory-sensitive applications. What are the alternatives? Pickling preserves all the structure of a data structure, so you can recover it exactly from the pickled file at a later time. However, that might not always be needed. If the file is to contain a list as in the example above, then maybe a simple flat, text-based, file format is in order. Let us see what it gives.

A naïve implementation would give:

import memory_profiler, random, pickle

def random_string():
    return "".join([ chr(64+random.randint(0,25)) for _ in xrange(20) ])

@profile
def create_file():
    x=[ (random.random(),
         random_string(),
         random.randint(0,2**64))
        for _ in xrange(1000000) ]

    f=open('machin.flat','w')
    for xx in x:
        print >>f, xx
 
@profile
def load_file():
    y=[]
    f=open('machin.flat','r')
    for line in f:
        y.append(eval(line))
    return y

if __name__=="__main__":
    create_file()
    #load_file()

Creating the file:

Filename: test-flat.py

Line #    Mem usage    Increment   Line Contents
================================================
     8                             @profile
     9      9.19 MB      0.00 MB   def create_file():
    10      9.34 MB      0.15 MB       x=[ (random.random(),
    11                                      random_string(),
    12                                      random.randint(0,2**64))
    13    246.09 MB    236.75 MB           for _ in xrange(1000000) ]
    14                             
    15    246.09 MB      0.00 MB       f=open('machin.flat','w')
    16    308.27 MB     62.18 MB       for xx in x:
    17                                     print >>f, xx

and reading the file back:

Filename: test-flat.py

Line #    Mem usage    Increment   Line Contents
================================================
    20                             @profile
    21      9.19 MB      0.00 MB   def load_file():
    22      9.34 MB      0.15 MB       y=[]
    23      9.34 MB      0.00 MB       f=open('machin.flat','r')
    24    300.99 MB    291.66 MB       for line in f:
    25    300.99 MB      0.00 MB           y.append(eval(line))
    26    301.00 MB      0.00 MB       return y

Memory consumption on writing is now much better. It still creates a lot of temporary small objects (for 60MB’s worth), but it’s not doubling memory usage. Reading is comparable (using only marginally less memory).

*
* *

Python design goals are radically different than, say, C design goals. While the latter is designed to give you good control on what you’re doing at the expense of more complex and explicit programming, the former is designed to let you code rapidly while hiding most (if not all) of the underlying implementation’s details. While this sounds nice, in a production environment, ignoring the implementation inefficiencies of a language can bite you hard, and sometimes when it’s too late. I think that having a good feel of how inefficient Python is with memory management (by design!) will play an important role in whether or not your code meets production requirements, scales well, or, on the contrary, will be a burning hell of memory.

This entry was posted on Tuesday, January 8th, 2013 at 8:55 am and is filed under data structures, programming, Python. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

5 Responses to Python Memory Management (Part II)

Evan Jones says:

12/01/2013 at 10:57 am

Hey! Thanks for referring to my memory management article (it was written a long time ago; I haven’t delved into the implementation recently to see if it is still up to date!) Just for the reference, my patch to get Python to actually eventually *release* these “arenas” back to the operating system was committed back as part of Python 2.5. I never got around to fixing the stuff I talked about in the “future improvements” section (eventually returning the integer, float, dict, and list free lists back to the operating system if they get too large).

Reply
- Steven Pigeon says:
  
  12/01/2013 at 16:46 pm
  
  I noticed that Python 2.5 fared noticeably better than 2.4 memory-wise, so thank you for the patch ;) However, as of 2.7.x (I haven’t really tried 3.x so far), the global heap still only grows on Linux. Do you think Python could use some type of compacting of the heap, or is the reference mechanism essentially not suitable for this?
  
  Reply
  - Evan Jones says:
    
    13/01/2013 at 11:24 am
    
    Interesting. I just tried a quick experiment allocating a list of 5 million custom objects then releasing the list. On my Mac (Python 2.7.2) memory usage goes from 1.82 GB to 1.49 GB, and I got similar results on Linux (2.7.3)
    
    I suspect memory usage doesn’t decrease more because lists and integers have their own free lists (or did 5 years ago), so much of that memory might remain allocated. When I was working on this I believed that fixing this would probably make a significant impact, but the internals may have changed since then.
    
    My guess (again based on old information) is that compacting the heap would be nearly impossible. Python uses raw pointers to objects in both the core interpreter and in the extension API. It would require a significant rewrite of the internals to be able to move objects.
    
    Reply
    - Evan Jones says:
      
      13/01/2013 at 11:30 am
      
      I just looked at my old post: I’m almost certain the integer free list is a problem in my previous experiment. I just re-ran the following code from my old post:
      
      l = []
      for i in xrange(5000000):
      l.append({})
      # Measure memory here
      del l
      
      Memory usage went from something like 1.4 GB to 15 MB. I think this happens because dictionaries do *not* have a free list to reuse their memory, so releasing them ends up releasing most of the memory back to the operating system, whereas before I was allocating custom objects and integers, so there may have been more memory fragmentation.
      
      Reply
Luciano Ramalho ☂ (@ramalhoorg) says:

12/08/2014 at 9:24 am

Evan, you should mention that array.array can be used to store integers and floats with minimal overhead, packed in memory just like C arrays, using just the necessary bytes per item. Anyone handling millions of numbers in Python should be using array.array or — even better — NumPy.

Reply

Harder, Better, Faster, Stronger