Fast exponentiation

A while ago, we looked at the computation of the GCD and found that the simplest implementation was, by far, the fastest. This week, we’ll have a look at another numerical problem: fast exponentiation (modulo machine-size integers).


The exponentiation a^b looks like an expensive operation because, at first, we think it needs b multiplications to be computed. Fortunately for us, and because of the arithmetic of exponentiation, we can compute a^b in O(\lg b) steps, which is much, much faster, even when b is smallish. Let’s us see how.

Let’s start by reviewing how exponents work:

\displaystyle x^n=\underbrace{x\times{x}\times{x}\cdots\times{x}}_{n\text{~times}},

x^a x^b = x^{a+b},

(x^a)^b = x^{ab}.

These properties let us write x^{20} as either (x^5)^4 or as x^{10}x^8x^2, or—as we will do—as x^{16}x^4. The decomposition that uses powers of two is especially convenient: each power can be computed by simple squaring and the series of powers map nicely with the binary representation of the original exponent. For example, 25_{10}=11001_2, and we have x^{25}=x^{16}x^{8}x: the exponents correspond to bits set to 1. We can exploit that to produce a general procedure.

We can write it either recursively,

unsigned expo_rec(unsigned x, unsigned p)
  if (p)
    if (p&1)
     return x*expo_rec(x,p-1);
      unsigned t=expo_rec(x,p/2);
      return t*t;
   return 1;

or iteratively,

unsigned expo_iter(unsigned x, unsigned p)
  unsigned t=x, e=1;

  while (p)
    if (p&1) e*=t;


  return e;

In both cases, the code is really simple, but we should suspect the second, iterative, version to be much faster. Is it?

* *

Again, the best is to take a quantitative approach and actually measure the thing. The following function measures the time taken to compute all combinations of power and base (yes, that’s the technical name for the number that is being raised to a power) from 0 to range-1. The code is as follows:

#include <stdint.h>
#include <sys/time.h>

typedef unsigned (*expo)(unsigned,unsigned);

uint64_t now()
  struct timeval tv;
  return tv.tv_sec*1000000+tv.tv_usec;

uint64_t test( unsigned range,
               expo f)
  uint64_t start=now();
  for (unsigned a=0;a<range;a++)
   for (unsigned b=0;b<range;b++)
  uint64_t stop=now();
  return stop-start;

Plugging the two above version in the test code with a range of 10000 yields the following times:

expo_rec 3.45s
expo_iter 1.11s

…thus leaving no doubts on which is faster. Let’s have a look at the code for expo_iter generated by gcc -O3 (4.8.1):

  400ac0:       85 f6                   test   esi,esi
  400ac2:       b8 01 00 00 00          mov    eax,0x1
  400ac7:       74 1c                   je     400ae5 
  400ac9:       0f 1f 80 00 00 00 00    nop    DWORD PTR [rax+0x0]
  400ad0:       89 c2                   mov    edx,eax
  400ad2:       0f af d7                imul   edx,edi
  400ad5:       40 f6 c6 01             test   sil,0x1
  400ad9:       0f 45 c2                cmovne eax,edx
  400adc:       0f af ff                imul   edi,edi
  400adf:       d1 ee                   shr    esi,1
  400ae1:       75 ed                   jne    400ad0 
  400ae3:       f3 c3                   repz ret 
  400ae5:       f3 c3                   repz ret 
  400ae7:       66 0f 1f 84 00 00 00    nop    WORD PTR [rax+rax*1+0x0]
  400aee:       00 00 

Except for the alignment-forcing nops, and the strange looking repz ret, the code is very straightforward.

* *

The next step is to replace unsigned by an abstract type T, that is, to use templates and an arbitrary-precision number class that properly overloads the needed operators. Right now, the code computes exponentiation modulo the size of machine-specific int (or, more precisely, unsigned), and that’s what I needed for now. Since the complexity raising a base a to a power b is O(\lg b) (more exactly, the number of multiplies is the number of bits in the size of the integers used, plus the number of bits set to 1 in the exponent), it’s really convenient to use in, say, a hash function for lookups.

One Response to Fast exponentiation

  1. […] we evaluate this naïvely, we end up doing multiply and additions. Even using the fast exponentiation algorithm, we still use multiplies. But we can, very easily, get down to […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: