Square Roots (Part V)

Last week we examined the complexity of obtaining k in the decomposition n=2^k+b for some integer n. This week, we’ll have a look at how we can use it to speed-up square root extraction. Recall, we’re interested in k because

2^k \leqslant 2^k+b < 2^{k+1},

with 0 \leqslant b < 2^k, which allows us to get easy bounds on \sqrt{n}. Better, we also have that

\sqrt{2^k} \leqslant \sqrt{2^k+b} \leqslant \sqrt{2^{k+1}},

and we know how to compute \sqrt{2^k}=2\frac{k}{2} (somewhat efficiently! Let’s combine all this and see how it speeds up things.

Most of the code for square roots is found here, and for shifting, including by half a bit, it is found here.

Using 2^\frac{k}{2} and 2^\frac{k+1}{2} as lower and upper bound for binary search will provide better performance because we reduce the number of iterations from, say 16 for 16-bits integers to about \frac{k}{2}, where k depends on n.

unsigned driven_binary_sqrt_bounds(unsigned n, int & i)
 {
  i=0;
  if (n)
   {
    unsigned k=std::numeric_limits<unsigned>::digits-__builtin_clz(n);
    unsigned l=shift_half(1<<k); // OK, truncates
    unsigned h=shift_half(1<<(k+1))+1; // rounding error from half shift
    while (l!=h)
     {
      i++;
      unsigned m=(l+h)/2;
      if (m*m > n)
       h=m;
      else
       l=m+1;
     }

    return l-1;
   }
  else
   return 0;
 }
 

As for Newton’s method, we can use the knowledge of k, and \frac{k}{2} to kick-start search at a better point than \frac{n}{2}. The modification to the algorithm aren’t extensive:

unsigned driven_newton_sqrt_bounds(unsigned n, int & i)
 {
  i=0;
  if (n)
   {
    unsigned k=std::numeric_limits<unsigned>::digits-__builtin_clz(n);
    unsigned l=shift_half(1<<k); // OK, truncates anyway
    unsigned h=shift_half(1<<(k+1))+1; // rounding error from half shift
    unsigned x=(l+h)/2; // should be pretty close
    
    unsigned d;
    do
     {
      i++;
      d = n/x;
      x = (x+d)/2;
     }
    while (d<x);

    // precision fix: it sometimes rounds up (42.9767
    // becomes 43, which is the best answer, but) we
    // need truncation (as for the other methods)
    if (x*x>n) x--;
    return x;
   }

  else return 0;
 }

The last correction can be left out if rounding is deemed OK.

*
* *

How does the knowledge of k helps finding square roots? As we did before, we just instrumented the code to obtain the number of iterations performed in the principal loop. We will use this as a metric of performance rather than, say, raw CPU time. Why so? Because counting the performance of a specific implementation is one thing, and as we showed a couple of weeks ago, shifting by half a bit is a trade-off of speed and precision, and we cannot exclude that someone would go through the trouble of implementing his own CPU with a “shift by half a bit” instruction. Let us, again, refer to this previous post for the detail of all the methods. Now, compare (Y-axis, number of iterations, X-axis, numbers):

sqrt-2kb-compared

where the “Bounded” versions are the version just above.

We see that the use of k reduces greatly the number of iterations for the binary search method. Again, the number of iteration will be about \frac{k}{2}. As for the Newton method, using an initial guess of

\displaystyle x=\frac{1}{2}\left(2^{\frac{k}{2}}+2^{\frac{k+1}{2}}\right)

somewhat reduces the number of iterations further from the “driven” Newton search and quite a lot from the naïve version where the initial guess is merely \frac{n}{2}.

*
* *

Unsurprisingly, then, having better initial guesses is a plus. I’m not sure, though, that the speed-up, as measured by the number of iterations, is worth the added complexity, even if it is a tiny amount. We’ve seen that using a built-in function that maps to an efficient instruction (__builtin_clz that maps to the CPU instruction bsr) is much faster than the equivalent naïve implementation so in the end the added complexity might be beneficial.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: