Compact Tree Storage

Implementing data structures in a way that uses efficiently memory should always be on your mind. I do not mean going overboard and micro-optimizing memory allocation right down to the bit. I mean organize data structures in memory so that you can avoid pointers, so that you can use contiguous memory segments, etc. Normally, minimizing storage by avoiding extra pointers when possible will benefit your program in at least two ways.

First, the reduced memory requirement will make your data structure fit in cache more easily. Remember that if pointers are 4 bytes long in 32 bits programming, they are 8 bytes long in 64 bits environments. This yields better run time performance because you maximize your chances of having the data you need in cache.

Second, contiguous memory layouts also allow for efficient scans of data structures. For example, if you have a classical binary tree, implemented using nodes having each two pointers, you will have to use a tree traversal algorithm, possibly recursive, to enumerate the tree’s content. If you don’t really care about the order in which the nodes are visited, what’s quite cumbersome.

It turns out that for special classes of trees, complete trees, there is a contiguous, and quite simple, layout.

Consider the following figure:

A complete binary tree

A complete binary tree

You notice that each node is numbered, and that you have the following relations holding between the nodes:

For a node n, you have…

  • that its parent is given by \lfloor \frac{1}{2}(n-1) \rfloor,
  • that its left child is given by 2n+1,
  • that its right child is given by 2n+2,

Assuming that the root is numbered 0. This suggests the layout:

The tree, as seen in a flat layout

The tree, as seen in a flat layout

where the arrows corresponds to parent/child links.

The proof that this mapping works correctly is somewhat self-evident. What is less evident, is that such mappings exist for trees with higher branching factors. In general, if you have a tree with a branching factor of k, you will have that, for a node n,…

  • the parent is given by \lfloor \frac{1}{k}(n-1) \rfloor,
  • the children are given by kn+1, kn+2, … kn+k.

So, plugging k=2 in the previous equations leads us to the binary tree case. The case k\geqslant{}3 corresponds to higher-order three. Letting k=4 gives us a layout for quad-trees, which are extensively used in computer graphics. k=8 yields octrees which are also used in computer graphics applications, such as collision detection, ray tracing and volume rendering. Note that if you put k=1, you get a list, that the parent is given correctly by n-1 and the (only) child by n+1.

The addresses of the first node in a row, for:

  • k=2, is given by 0,1,3,7,15,… or 2^n-1, is Sloane’s A000225.
  • k=3, is given by 0,1,4,13,40,121,… or \frac{1}{2}(3^n-1), and is Sloane’s A003462.
  • k=4, is given by 0,1,5,31,85,341,… or \frac{1}{3}(4^n-1), and is Sloane’s A002450.

The general formula is \frac{1}{k-1}(k^n-1).

* *

The cases with k=2, k=4 and k=8 can be implemented very efficiently depending on the type of processor you are using. Of course, multiplies by powers of two can be replaced by left shifts. On Intel processors, the following two functions:

inline int left_child(int x) { return 2*x+1; }
inline int right_child(int x) { return 2*x+2; }

Can each be replaced by a single lea instruction. Assuming the parameter value is passed through eax, left_child should compile to lea eax,2*eax+1. Since the normal calling convention for Intel is to return integer types through eax, we’re done.

* *

If you think this addressing scheme is way cool, you’ll find even more interesting that the scheme dates from way back [1]. Although Williams considers it only in terms of a binary heap (for heap sort) and that expresses the addressing relations only in terms of the address of the parent; one can understand through that very terse paper (a page and a few lines) that the addressing can be generalized. Williams’ paper is the earliest reference I could find on the addressing scheme. However, I am not sure that it is the original paper; surely the 1964 C.ACM report is based on earlier work?

[1] J. W. J. Williams — Algorithm 232: Heap Sort — C. ACM, vol. 7 n° 6 (1964) pp. 347–348 [at the ACM (account required)]

18 Responses to Compact Tree Storage

  1. This is practical only if all branches are of almost the same height, otherwise you get way too many empty cells. Good octrees don’t waste memory on large empty cells, so I doubt this can be readily applied here.

    Also, your tree picture has indices 0-15 excluding 14, and array has the entire 0-15 range.

  2. Steven Pigeon says:

    ah, you’re right about 14/15. I will fix that. Thanks for pointing that one out.

    And yes, if you tree is sparse, you waste a lot of memory. You do have to trade-off between the sparseness, the maximum depth of the tree, and the amount of memory you’re willing to allocate. So, you are right, a very sparse or very uneven depth tree will waste more and more memory as the branching factor grows.

    While wasting memory may seems like a crime, you also have to factor the cost of new-ing or malloc-ing memory. Cost in time as well as cost in memory because it’s not because you allocate 12 bytes that 12 bytes are allocated. The allocator may round up the amount of memory in an implementation-specific fashion, say to the next multiple of 16, as well as allocate meta-data that manages the allocated memory.

  3. Being sparse does indeed have the disadvantage of larger tree nodes, but it’s because of explicit children references, not because of allocation overhead (you can minimize overhead by allocating nodes sequentially (ptr += size) either in a huge memory arena, if you manage all memory this way, or in pages that are big enough to hide allocation overhead (i.e. 16 bytes per 16K page is not noticeable))

    • Steven Pigeon says:

      If you do write your own slab-allocator, you can save a lot indeed. However, it also leads to complications when you exhaust the slab. Everything is a trade-off; direct addressing as I describe can be used for situations where you know in advance the maximum depth of the tree and the quantity of allocated, contiguous, memory, even not all used, is still more interesting than allocating nodes the classical way (which may end up bloating because of both extra pointers and allocation rounding up) or using slabs and pointers (which, in some cases, should be replaced by indices).

      It can also be used, as I pointed out, in other data structures, such as heaps. Heaps have all kind of nice properties, one of which is that you can always “heapify” an array in situ, using the addressing scheme I present this week.

      (As a final note, don’t assume the OS or the C library won’t do wasteful things with memory. For exemple, in win32s (which may be obsolete, but nonetheless) memory allocations are rounded up to the next multiple of 16 bytes, and the control block associated with the allocated memory is also quite big (in the order of 48 or 64 bytes!). With systems like that, we’d be much better off by using either complete tree addressing or your suggestion, that is, allocating a big slab).

  4. Dan Farina says:

    Consider Prüfer sequences also:

    It’s a very interesting way to represent a tree, compact, and generally amenable to simple run-length-encoding schemes.

    • Steven Pigeon says:

      That’s actually very interesting, I didn’t know about that one.

      I would guess it qualifies as a succinct data structure where a combinatorial mapping is used to encode a data structure’s connectivity.

      Pretty cool, thx for the pointer.

  5. Chris Dew says:

    This’ll be more a bit efficient that you’d expect when dealing with large trees.

    Unused 4KB pages (under linux on x86) will not use physical RAM – that could be a considerable saving, especially if the nodes are large. If the nodes were 4KB, it would be perfect ;-)

  6. Steven Pigeon says:

    That’s true for memory reserved for your process. However, when you instanciate a memory block using malloc, the new memory is cleaned up (filled with zeroes) when it is first given to your process by the OS so that you can’t peek other processes’ left-overs. That would necessarily get the page into RAM.

    However, if you mmap a file, it is true that a page is not loaded in memory until referenced somehow. If a page is not “dirty” (always accessed in read mode) it is not swapped out but merely discarded.

  7. […] Compact Tree Storage Implementing data structures in a way that uses efficiently memory should always be on your mind. I do not mean going […] […]

  8. […] or from 2D to 2D, bijective or not, are useful more often than we generally think. For example, in compact tree storage I presented one such bijective function, mapping a region of the plane (the tree) to a line. In […]

  9. […] to communicate your mindset to the next programmer. Maybe he doesn’t know about about compact tree storage in a flat array; I can’t suppose he does. Maybe he’s a junior, maybe not; maybe he does […]

  10. steve says:

    What if you have to insert or delete?

    • Steven Pigeon says:

      If you can add/delete nodes, I would say that a bit suffices to say whether or not a node is allocated. More, I would say that only leaves (the deepest nodes) would need that bit if you keep your tree as balanced as possible after each insert/delete.

      Think of how an AVL tree rebalances itself after each insert/delete operation. We would need to work out all the details, but I guess that if you kill an internal node (not a leaf), it suffices to move its left child in its place (if you cut with ). You would need to repeat the same procedure for the child, and recursively, until you reach a leaf.

  11. […] in such a way that inserting, deleting, new items is still . Heaps can also be represented as a compact tree, which is a reasonable […]

  12. […] could also argue that this scheme is cache-friendlier as one can use some kind of compact tree storage to maintain the keys and the structure of the tree in a small amount of memory, and with keys that […]

  13. […] here the magic happens in heapify(). Using the addressing described in a previous post, it becomes simple to scan the array from the middle backwards to the beginning and enforce at each […]

  14. […] a while ago I discussed using flat arrays and address calculations to store a tree in a simple array. The trick […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: