Pairing Functions

Sometimes you have to encode reversibly two (or more) values onto a single one. The typical example of a pairing function that encodes two non-negative integers onto a single non-negative integer (therefore a function f:\mathbb{Z}^*\times\mathbb{Z}^*\to\mathbb{Z}^*) is the Cantor function, instrumental to the demonstration that, for example, the rational can be mapped onto the integers.

In a more pragmatic way, it may be necessary to encode two values onto one for data compression purposes, or to otherwise exploit a protocol that does not allow many distinct values to be sent separately. The simplest way of pairing two integer is to interleave their bits, for example, with something like:

uint64_t pair(uint32_t x, uint32_t y)
{
  int64_t p=0;
  int i=0;
  while (x||y)
   {
    p|= ((uint64_t)(x&1)<<i);     x>>=1;
    p|= ((uint64_t)(y&1)<<(i+1)); y>>=1;
    i+=2;
   }

  return p;
}

with its inverse

void depair(uint64_t p, uint32_t & x, uint32_t & y)
 {
  x=0;
  y=0;
  int i=0;
  while (p)
   {
    x|=((uint32_t)(p&1)<<i); p>>=1;
    y|=((uint32_t)(p&1)<<i); p>>=1;
    i++;
   }
 }

This implementation, of course, works only with machine-size integers. The idea of interleaving bits (or digits) for a pairing function isn’t new—I remember reading about its invention in the 19th century, but I can’t remember the name of the mathematician. Other kind of pairing functions were proposed, and one that is particularly elegant—and impractical—was proposed by Gödel as a way of encoding programs. Known as Gödel Numbering, the scheme use the fundamental theorem of arithmetic to encode strings onto a unique number. That is, for a string x_1x_2\ldots{}x_n, the Gödel number is given by:

Go(x_1x_2\ldots{}x_n)=2^{x_1}3^{x_2}...p_n^{x_n}

Where p_n is the nth prime number. Since it is uniquely factorizable—it’s easy to divide the number per two until we get an odd remainder to find x_1—the code is reversible.

Gödel solved the general case of encoding n-tuples, but let us return to pairing functions, for 2-tuples, that is, plain tuples. We would have

G_2(x,y)=2^x 3^y

At first, it seems that having a pairing function based on integer factorization isn’t much of a good idea because factorization is hard, but this special case affords an especially efficient algorithm. In fact, we can factor the number in essentially O(\lg y) steps.

Think about it before you continue reading.

*
* *

Since the encoded number is very simple, it only has two distinct factors, 2 and 3, the complexity of finding x and y is greatly reduced. A naïve approach would successively divide by three, letting you find y, then successively divide by 2, letting you find x, yielding an algorithm essentially in O(\max(x,y)) steps (note that I am leaving the complexity of division out, to keep things simple, but you would have to include it were you to perform a complete analysis of the algorithm.)

But, since we already know the prime factors, just not their multiplicities, we can observe that if G(x,y)\:\mathrm{mod}\:3^k=0, for an integer k\geqslant{0}, then k\leqslant{y}. OK, that means that if the remainder of G(x,y)\:\mathrm{mod}\:3^k is zero, 3^k divides G(x,y) and therefore 3^y=3^k3^l, with l possibly zero. This will allow us to find y in O(\lg y) steps.

Because, if G(x,y)\:\mathrm{mod}\:3^k=0, then k\leqslant{y}, we can start with k=1, then double k until G(x,y)\:\mathrm{mod}\:3^{2k}\neq{0}, that is, 2k is too large for it to be a solution. This mean that y lays somewhere between k and 2k, which gives us a bisection range.

Both the “galloping phase” of successively doubling k and the bisection phase are performed in O(\lg y). Once y is obtained, obtaining x is easy: you can do just the same thing. Or use a fast \log_2 routine for large integers. Also performed in O(\lg x).

Since it’s quite cumbersome to deal with big integers in C or C++ (and that’s somewhat of an understatement), let us propose a Python implementation:


def godel(x,y):
    return (2**x)*(3**y)

def degodel_log(z):

    x,y=0,0

    ## "galloping" phrase
    lo_y=0
    hi_y=1
    while (z % 3**hi_y==0): lo_y=hi_y; hi_y*=2

    # ok, we know it's somewhere lo_y<=y<hi_y
    while (lo_y<hi_y):
        test_y=(hi_y+lo_y+1)/2
        if (z % 3**test_y):
            hi_y=test_y-1
        else:
            lo_y=test_y

    z/=(3**lo_y)
    x= int(math.log(z+0.01,2)) # numerical stability issue here
    return (x,lo_y)

8 Responses to Pairing Functions

  1. Roland Marquis says:

    Bonjour, est-ce que tu es certain que le code que tu proposes compile et fonctionne tel que tu le proposes?

    • Steven Pigeon says:

      Pour le code Python, il faut importer math.

      Pour le code C, il faut <stdint.h>, qui donne les int32_t et les autres (C99).

      • Roland Marquis says:

        Je n’ai pas compris les boucles exaustives
        Ex utilisant 32 bit au lieux de 64 mais la logique est la meme.

        #include 
        
        void depair(u_int32_t p, u_int16_t *x, u_int16_t *y)
        {
        
          *x=0;
          *y=0;
        
          u_int32_t temp = 0;
        
          temp = (p & 0xffff0000) >> 16;
          *x = (temp & 0x0000ffff);
          *y = (u_int16_t)(0x0000fffff & p);
        }
        
        
        u_int32_t pair(u_int16_t x, u_int16_t y)
        {
          u_int32_t p=0;
          u_int32_t temp=0;
        
          temp = (u_int32_t)(x & 0x0000ffff);
          p = temp << 16;
          p |=  y;
        
          return p;
        }
        
        
        int main(int argc,char **argv)
        {
          u_int32_t result= 0;
          u_int16_t base = 0;
          u_int16_t debase = 0;
        
          base = 1212;
          debase = 4343;
        
          result = pair(base,debase);
        
          base = 0;
          debase = 0;
        
          depair(result,&base,&debase);
        
          printf("Depair give %u : %u \n",
                 base,
                 debase);
        
          return 0;
        
        }
        
        • Steven Pigeon says:

          The exhaustive loops works by interleaving the bits from one number into the bits of the other. For example: if the first one is abcdef and the second one ghijkl, then the pairing function computes agbhcidjekfl. Just like when you shuffle cards.

        • Steven Pigeon says:

          Your proposition could also be written:

          int64_t pair2(int32_t x, int32_t y)
           {
            return ((int64_t)x)<<(sizeof(int32_t)*CHAR_BIT) | y;
           }
          
          void depair2(int64_t p, int32_t & x, int32_t & y)
           {
            y=p; // truncated!
            x=p>>(sizeof(int32_t)*CHAR_BIT);
           }
          

          where almost everything becomes compile-time constants. Or you could use something like an union:

          typedef 
           union
            {
             struct { int32_t lo,hi; };
             int64_t v;
            }  pair_t;
          
          int64_t pair3(int32_t x, int32_t y)
           {
            pair_t s;
          
            s.lo=y;
            s.hi=x;
            return s.v;
           }
          
          int64_t depair3(int64_t p, int32_t & x, int32_t & y)
           {
            pair_t s;
            s.v=p;
            y=s.lo;
            x=s.hi;
           }
          
          

          which now does not contain any explicit shift and leaves the compiler to figure out the best code to generate for this ;)

    • Steven Pigeon says:

      I modified the original to be safer (it now works on the complete intended range). Thanks for making me question that ;)

      • Roland Marquis says:

        Enfin, je ne comprend toujours pas l’aventage d’avoir les bit
        de deux entier s’entre croiser.

        Pour ce qui est d’utiliser des unions je ne suis pas pour,
        j’opterais beaucoup plus pour une forme utilisant une logique plus solide du type.

        Je n’apprecie pas vraiment les tentatives d’optimisation qui rendre le code source plus court et moin lisible.

        Bien heureux ceux qui l’adopte et qui ne redebug plus jamais leur code :)

        void pair2( void *high,void *low,void *retval,short intsize)
        {
        if( high == NULL ||
        low == NULL ||
        retval == NULL)
        {
        //zerror…
        return;
        }
        switch(intsize)
        {
        case sizeof(u_int8_t):
        //etc,,,,,,
        break;
        case sizeof(u_int16_t):
        //etc,,,,,,
        break;
        case sizeof(u_int32_t):
        //etc,,,,,,
        break;
        case sizeof(u_int64_t):
        //etc,,,,,,
        break;
        default:
        //zerror….
        return;
        break;
        }
        return;
        }

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: