swappity::swap

constexpr is one of C++11’s features I’m starting to like very much. constexpr is a bit finicky, but it allows you to evaluate functions—including ctors—at compile time. This of course, allows computations to be replaced directly by results.

So in the best of cases, you could end-up with less code, or better yet, no code at all!

So let’s consider a toy problem: managing endian conversions. For a lot of things, integers are represented in “network order” (big endian) which may, or may not, be the internal representation of the machine manipulating the data. Intel CPUs are small endian machines.

Of course, there are already standard C implementations (in <netinet/in.h>, or maybe in some other header depending on your compiler/OS), for those: hton* (host-to-network) and ntoh* (network-to-host). However, these are C-style functions that may be immune to deep compiler optimizations. Granted, they’re not very compute-intensive, but we may be able to do better for constant arguments and allow the compiler to optimize in depth, maybe to the point of not generating code at all.

For this, constexpr comes to the rescue! As I said earlier, it will evaluate as many things as it can at compile time. But, to the difference of the preprocessor (these evil #define statements), it will honor all of C++’s semantics. #defines are hardly anything more than find-and-replace macros, which may lead to a number of surprises.

*
* *

How do we detect endianness at compile-time? We can create a constexpr integer constant initialized to 1 and test if its first byte is 0 (which would mean the CPU is big endian) or 1. The rest is just swapping bytes around, if we need to.

#ifndef __MODULE_SWAPPITY__
#define __MODULE_SWAPPITY__

#include <cstdint>

//#undef __GNUG__ // see what happens!

class swappity
 {
  private:
  
      constexpr static int x=1;

  public:

      constexpr static bool will_swap() { return (*(char*)&x); }

      constexpr static uint8_t swap(uint8_t x) { return x; }

      constexpr static uint16_t swap(uint16_t x)
       {
        return will_swap() ?
#ifdef __GNUG__
         __builtin_bswap16(x)
#else
         ((x>>8)|(x<<8))
#endif
         : x;
       }

      constexpr static uint32_t swap(uint32_t x)
       {
        return will_swap() ?
#ifdef __GNUG__
         __builtin_bswap32(x)
#else
         (swap((uint16_t)(x>>16)) | (swap((uint16_t)x)<<16))
#endif
         : x;
       }

      static uint64_t swap(uint64_t x)
       {
        return will_swap() ?
#ifdef __GNUG__
         __builtin_bswap64(x)
#else
         (swap((uint32_t)(x>>32)) | ((uint64_t)swap((uint32_t)x)<<32))
#endif
         : x;
       }
 };

#endif
// __MODULE_SWAPPITY__

The only really fancy thing about this implementation is the use of GCC-specific intrinsics. If the code is compiled by G++, then __GNUG__ is defined, and the intrinsics are used. Otherwise, “ordinary” bit-twiddling is used. The funny thing is, there doesn’t seem to be a significant difference in speed either way:

#include <sys/time.h>
#include <iostream>
#include <iomanip>
#include <swappity.hpp>

uint64_t now()
 {
  struct timeval t;
  gettimeofday(&t,0);
  return t.tv_sec*1000000+t.tv_usec;
 }

int main()
 {
  int s=0; // to fool optimizations
  uint64_t start=now();
  for (uint32_t l=0,z=0;l<=z;l=z,z++)
    s+=swappity::swap(z);
  uint64_t stop=now();

  std::cout
   << (stop-start)/1000000.0 << 's' << std::endl
   << s << std::endl;
  
  return 0;
 }

This program runs in 3.55±0.02s with __GNUG__ defined, and in …3.57±0.01s without. It basically makes no difference—except that intrinsics are cool.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: