Compact Tree Storage

April 7, 2009

Implementing data structures in a way that uses efficiently memory should always be on your mind. I do not mean going overboard and micro-optimizing memory allocation right down to the bit. I mean organize data structures in memory so that you can avoid pointers, so that you can use contiguous memory segments, etc. Normally, minimizing storage by avoiding extra pointers when possible will benefit your program in at least two ways.

First, the reduced memory requirement will make your data structure fit in cache more easily. Remember that if pointers are 4 bytes long in 32 bits programming, they are 8 bytes long in 64 bits environments. This yields better run time performance because you maximize your chances of having the data you need in cache.

Second, contiguous memory layouts also allow for efficient scans of data structures. For example, if you have a classical binary tree, implemented using nodes having each two pointers, you will have to use a tree traversal algorithm, possibly recursive, to enumerate the tree’s content. If you don’t really care about the order in which the nodes are visited, what’s quite cumbersome.

It turns out that for special classes of trees, complete trees, there is a contiguous, and quite simple, layout.

Read the rest of this entry »


Stretch Codes

September 9, 2008

About ten years ago, I was working on a project that needed to log lots of information and the database format that was chosen then was Microsoft Access. The decision seemed reasonable since the data would be exported to other applications and the people who would process the data lived in a Microsoft-centric world, using Excel and Access (and VBA) on a daily basis. However, we soon ran into a major problem: Access does not allow a single file to be larger than 2GB.

After sifting through the basically random error messages that had nothing to do with the real problem, we isolated the bug as being reaching the maximum file size. “Damn that’s retarded!” I remember thinking. “This is the year 2000! If we don’t have flying cars, can we at least have databases larger than 2GB!?“. It’s not like 2GB was a file size set far off into the future as are exabytes hard-drives today. There were 20-something GB drives available back then, so the limitation made no sense whatsoever to me—and still doesn’t. After the initial shock, I got thinking about why there was such a limitation, what bizarre design decision lead to it.

Read the rest of this entry »