Short Pointers

One good thing with 64 bits addresses, is that you can, in principle, use essentially as much memory as you want—the address space certainly exceeds today’s computers’ capabilities. One bad thing, especially when you create lots of objects and need plenty of pointers, is that 64 bits pointers are big. They use 8 bytes of memory. One or two pointers aren’t a problem, of course, but what if your data structure is a sparse graph, each node being mostly pointers, and that you need to create a very large graph?

pincer-grip

One solution is to use stretch codes, as I proposed a while ago, trading off precision of addressing for shorter pointers. However, unless you rewrite the memory allocator, the technique won’t play well with the default new. Another solution is to store just barely the number of bits (rounded to bytes) necessary to hold an address. Can we do this? If so, how?

The first thing is to know the address space for your processor.
Typing

> grep address /proc/cpuinfo

in a shell will give you the address characteristics of your CPU. You will get something like:

address sizes	: 36 bits physical, 48 bits virtual

This means that this computer uses 48 bits for the virtual addresses, and those 48 bits are mapped onto 36 in physical memory. So if we could just use 48 bits, we’d save 25% right off the bat over 64 bits pointers. But in fact, we can do better if we understand how virtual memory works in Linux (and on x86 and x86_64 processors).

The first really weird thing is that the stack grows down on these systems. The bottom of the stack, at the top of the process memory, is relative to the address 0x7fffffffffff (if you have 48 virtual address bits). That’s the largest address you can have in your program (unless you’re in kernel space, then the address starts with 0xffff800000000000). The data allocated in the heap are relative to the address zero, at the bottom of the memory. Note that this address zero is not the system-wide, physical, memory address zero, it’s the address zero relative to your program’s segment. Addresses in heap, therefore, are smaller than addresses on the stack, and their size will be proportional to the amount of physical memory you have. If you have 16GB, then you’ll have addresses of about 34 bits, since 4GB requires 32 bits, and you have 4 times that, so two more bits. Playing safe, you can guess that heap-allocated data has an address of at most 40 bits.

*
* *

OK, so now how do we exploit this idea or using shorter pointers? One possible solution is to have a parametric class (a template) that takes in input the type of the pointer (we’re still interested in type safety) and the number of bits (which will be rounded up to bytes) used to store the pointer.

Let’s consider the case of heap pointers (those “small” pointers), those with a number of leading zeroes. The implementation I propose (which may not be the absolute most efficient one) is as follows:

#ifndef __MODULE_SHORTEST_PTR__
#define __MODULE_SHORTEST_PTR__

#ifdef __SHORT_PTR__

#include <stdint.h>

template <typename T, int bits=SHORT_PTR_BITS>
class shorter_ptr
 {
 private:

  // compresses BITS, up to max_bytes
  // (8 for 64 bits address, put more
  // if needed)
  enum { nb_bytes=(bits+7)/8, max_bytes=8 };
  
  uint8_t ptr[nb_bytes];

  void squish(const T * a)
   {
    const uint8_t * p_a=(const uint8_t*)&a;
    for (int i=0;i<nb_bytes;i++)
     ptr[i]=p_a[i];
   }

  T * unsquished() const 
   {
    // trick to avoid casting and around
    // "dereferencing type-punned pointer
    // will break strict-aliasing rules 
    // [-Wstrict-aliasing]"
    union 
     {
      uint8_t p_a[max_bytes];
      T * p;
     };

    int i;
    for (i=0;i<nb_bytes;i++)
    p_a[i]=ptr[i];

    for (;i<max_bytes;i++)
     p_a[i]=0;

    return p;
   }

 public:

  T * operator=(const T * & other) { squish(other); return unsquished(); }
  operator T *() const { return unsquished(); }
  T * operator->() const { return unsquished(); }

  shorter_ptr(T * a) { squish(a); }
  shorter_ptr() { squish(nullptr); }
 };
#else
template <typename T, int bits> 
 using shorter_ptr=T *;
#endif


#endif
 // __MODULE_SHORTEST_PTR__

Note that it has two symbols defined. The first one, __SHORT_PTR__ conditionally activates the use of short pointers. If it’s not defined, the template evaluates to a simple pointer. If it’s defined, then SHORT_PTR_BITS provides a (reasonable) default pointer size.

The implementation relies on the two functions squish and unsquish that actually do the computation of dropping the leading zeroes of the pointer on storage and expanding back the pointer on read.

The use of the class is rather simple:

thingie * t = new thingie();

shorter_ptr<thingie,40> ptr(t);

std::cout
  << t << std::endl
  << ptr << std::endl;

It therefore behaves like a normal pointer. Well, mostly, because this proof of concept doesn’t provide full pointer arithmetic. It would not be very difficult to do so, because it only requires to know sizeof(T) and to provide overloads for the usual operators, like ++ (both variants), +=, etc.

*
* *

A possible refinement would be to use relative addresses, with a bit, somewhere in the state, to indicate whether the offset is relative to the top of the memory or to the bottom. This may help save a few bytes because if we suppose that we can’t create very large data sets (say 34 or 35 bits relative to either top or bottom), allowing us to use 5 bytes in almost all cases.

*
* *

Again, trading 8 bytes pointers for 5 or 6 bytes pointers doesn’t sound like much, but if you’re going to create 250 million pointers, this “pointless” optimization translates to 500 or 750MB. But that’s only considering the memory savings aspect. There are other effects (that I did not quantify, so what follows is based on hunches) such as line cache sharing and memory bandwidth that may impact your performance positively. Say you have a structure that, for reasons your own, need five pointers. On an hypothetical machines with 32-bytes wide cache lines, you’ll need two lines to be brought in cache to use the pointers. If you have 5 bytes pointers, they’ll all fit on a 32-bytes line and you’ll still have some place left for other data. This could result in a much faster program for no other reason than playing well with the cache.

3 Responses to Short Pointers

  1. Guillaume B says:

    Check out ObjC’s Tagged Pointers, Mike Ash has a very complete article showing what’s going on. It’s basically entire classes stored in pointers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: