Last week we had a look at how much memory basic Python objects use. This week, we will discuss how Python manages its memory internally, and why it goes wrong if you’re not careful.
To speed-up memory allocation (and reuse) Python uses a number of lists for small objects. Each list will contain objects of similar size: there will be a list for objects 1 to 8 bytes in size, one for 9 to 16, etc. When a small object needs to be created, either we reuse a free block in the list, or we allocate a new one.
There are some internal details on how Python manages those lists into blocks, pools, and “arena”: a number of block forms a pool, pools are gathered into arena, etc., but they’re not very relevant to the point we want to make (if you really want to know, read Evan Jones’ ideas on how to improve Python’s memory allocation). The important point is that those lists never shrink.
Indeed: if an item (of size x) is deallocated (freed by lack of reference) its location is not returned to Python’s global memory pool (and even less to the system), but merely marked as free and added to the free list of items of size x. The dead object’s location will be reused if another object of compatible size is needed. If there are no dead objects available, new ones are created.
If small objects memory is never freed, then the inescapable conclusion is that, like goldfishes, these small object lists only keep growing, never shrinking, and that the memory footprint of your application is dominated by the largest number of small objects allocated at any given point.
Therefore, one should work hard to allocate only the number of small objects necessary for one task, favoring (otherwise not very pythonèsque) loops where only a small number of elements are created/processed rather than (more pythonèsque) patterns where lists are created using list generation syntax then processed.
While the second pattern is more à la Python, it is rather the worst case: you end up creating lots of small objects that will come populate the small object lists, and even once the list is dead, the dead objects (now all in the free lists) will still occupy a lot of memory.
The fact that the free lists grow does not seem like much of a problem because the memory it contains is still accessible to the Python program. But from the OS’s perspective, your program’s size is the total (maximum) memory allocated to Python. Since Python returns memory to the OS on the heap (that allocates other objects than small objects) only on Windows, if you run on Linux, you can only see the total memory used by your program increase.
This is a problem if you have to run multiple instances of the same program on the same machine. Even if you have server-class machine, you may still encounter problems because suddenly 32GB of RAM isn't quite enough.
Let us prove my point using memory_profiler, a Python add-on module (which depends on the python-psutil package) by Fabian Pedregosa (the module’s github page). This add-on provides the decorator @profile that allows one to monitor one specific function memory usage. It is extremely simple to use. Let us consider this small program (it makes my point entirely):
import copy, memory_profiler @profile def function(): x=range(1000000) # allocate a big list y=copy.deepcopy(x) del x return y if __name__=="__main__": function()
python -m memory_profiler memory-profile-me.py
Filename: memory-profile-me.py Line # Mem usage Increment Line Contents ================================================ 3 @profile 4 9.11 MB 0.00 MB def function(): 5 40.05 MB 30.94 MB x=range(1000000) # allocate a big list 6 89.73 MB 49.68 MB y=copy.deepcopy(x) 7 82.10 MB -7.63 MB del x 8 82.10 MB 0.00 MB return y
This small program creates a list with 1,000,000 ints (at 24 bytes each, for ~24 million bytes) plus a list of references (at 8 bytes each, for ~8 million bytes), for about 30MB. It then deep-copies the object (which allocates ~50MB, not sure why; a simple copy would allocate only 8MB of references; a deep copy should allocate about 30MB as well). Freeing x with del frees the reference list, kills the associated objects, but lo!, the amount of memory only goes down by the number of references, because the list itself is not in a small objects’ list, but on the heap, and the dead small objects remain in the free list, and not returned to the interpreter’s global heap.
In this example, we end up with twice the memory allocated, with 82MB, while only one list necessitating about 30MB is returned. You can see why it is easy to have memory just increase more or less surprisingly if we’re not careful.
On a related note: is pickle wasteful?
Pickle is the standard way of (de)serializing Python objects to file. What is its memory footprint? Does it create extra copies of the data or is it rather smart about it?
Consider this short example:
import memory_profiler, random, pickle def random_string(): return "".join([ chr(64+random.randint(0,25)) for _ in xrange(20) ]) @profile def create_file(): x=[ (random.random(), random_string(), random.randint(0,2**64)) for _ in xrange(1000000) ] pickle.dump(x,open('machin.pkl','w')) @profile def load_file(): y=pickle.load(open('machin.pkl','r')) return y if __name__=="__main__": create_file() #load_file()
With one invocation to profile the creation of the pickled data, and one invocation to re-read it (you comment out the function not to be called). Using memory_profiler, the creation uses a lot of memory:
Filename: test-pickle.py Line # Mem usage Increment Line Contents ================================================ 8 @profile 9 9.18 MB 0.00 MB def create_file(): 10 9.33 MB 0.15 MB x=[ (random.random(), 11 random_string(), 12 random.randint(0,2**64)) 13 246.11 MB 236.77 MB for _ in xrange(1000000) ] 14 15 481.64 MB 235.54 MB pickle.dump(x,open('machin.pkl','w'))
and re-reading a bit less:
Filename: test-pickle.py Line # Mem usage Increment Line Contents ================================================ 18 @profile 19 9.18 MB 0.00 MB def load_file(): 20 311.02 MB 301.83 MB y=pickle.load(open('machin.pkl','r')) 21 311.02 MB 0.00 MB return y
So somehow, pickling is very bad for memory consumption. The initial list takes up more or less 230MB, but pickling it creates an extra 230-something MB worth of memory allocation.
Unpickling, on the other hand, seems fairly efficient. It does create more memory than the original list (300MB instead of 230-something) but it does not double the quantity of allocated memory.
Overall, then, (un)pickling should be avoided for memory-sensitive applications. What are the alternatives? Pickling preserves all the structure of a data structure, so you can recover it exactly from the pickled file at a later time. However, that might not always be needed. If the file is to contain a list as in the example above, then maybe a simple flat, text-based, file format is in order. Let us see what it gives.
A naïve implementation would give:
import memory_profiler, random, pickle def random_string(): return "".join([ chr(64+random.randint(0,25)) for _ in xrange(20) ]) @profile def create_file(): x=[ (random.random(), random_string(), random.randint(0,2**64)) for _ in xrange(1000000) ] f=open('machin.flat','w') for xx in x: print >>f, xx @profile def load_file(): y= f=open('machin.flat','r') for line in f: y.append(eval(line)) return y if __name__=="__main__": create_file() #load_file()
Creating the file:
Filename: test-flat.py Line # Mem usage Increment Line Contents ================================================ 8 @profile 9 9.19 MB 0.00 MB def create_file(): 10 9.34 MB 0.15 MB x=[ (random.random(), 11 random_string(), 12 random.randint(0,2**64)) 13 246.09 MB 236.75 MB for _ in xrange(1000000) ] 14 15 246.09 MB 0.00 MB f=open('machin.flat','w') 16 308.27 MB 62.18 MB for xx in x: 17 print >>f, xx
and reading the file back:
Filename: test-flat.py Line # Mem usage Increment Line Contents ================================================ 20 @profile 21 9.19 MB 0.00 MB def load_file(): 22 9.34 MB 0.15 MB y= 23 9.34 MB 0.00 MB f=open('machin.flat','r') 24 300.99 MB 291.66 MB for line in f: 25 300.99 MB 0.00 MB y.append(eval(line)) 26 301.00 MB 0.00 MB return y
Memory consumption on writing is now much better. It still creates a lot of temporary small objects (for 60MB’s worth), but it’s not doubling memory usage. Reading is comparable (using only marginally less memory).
Python design goals are radically different than, say, C design goals. While the latter is designed to give you good control on what you’re doing at the expense of more complex and explicit programming, the former is designed to let you code rapidly while hiding most (if not all) of the underlying implementation’s details. While this sounds nice, in a production environment, ignoring the implementation inefficiencies of a language can bite you hard, and sometimes when it’s too late. I think that having a good feel of how inefficient Python is with memory management (by design!) will play an important role in whether or not your code meets production requirements, scales well, or, on the contrary, will be a burning hell of memory.