A Bit About Bit-Fields

Let’s make a detour through low-level programming this week. Let’s talk about bit-fields and some of their quirks.

amonite

Bit-fields are used to specify an assembly of integral values that take an arbitrary numbers of bits. Normally, int, say, would take 32 bits, or whatever is the normal size for int on your system. This may be rather wasteful if you need to store only 20 different values. For this, only 5 bits would be sufficient. Bit-fields allows you to define, within a region of memory, integers (or char or other integral types) with a given number of bits, from 0 (we’ll talk about 0 later) bits to the maximum number of bits the integral takes.

That is, you can define int x:5 to be a 5-bit int and short z:12 to be a 12-bit short. The defined types plays its role in up-casting back to an “ordinary” integral type. If you group bit-fields into a struct, as below, you’ll have a packed representation without wasted space between bit-fields.

typedef struct __test_struct__
 {
  char thingy:3;
  char gizmo:2;
  short :0; // sync to next alignment (implementation-specific!)
  int watchamacallit:5;

  __test_struct__(int)
   {
    memset(this,0,sizeof(__test_struct__));
   }

  __test_struct__()
   : thingy(0),
     gizmo(0),
     watchamacallit(0)
  {};
 } test_struct;

Or almost. An anonymous bit-field (as above, short :0) will force the next bit-field to be aligned to a word-boundary or something similar. The C++ standard says it’s implementation-specific, but in G++ 5.2.1, the alignment is the next alignment for the integral type used in the anonymous zero bit-field. Above, it’s short, so it will have the next field start at the next alignment boundary for short. The same works for the end of the struct. Above, the last bit-field is int, and the boundary of the struct will be the next valid alignment for int. If you have your last item as char, then the struct would be aligned—rounded up—to the next byte.

Another thing one must worry about is initialization. Alignment and padding is implementation-specific, but the standard says that whatever is not explicitly part of the declaration (what’s not “participating in the value representation”) is not initialized. That is, the pad in the middle of the struct will not be initialized by the (default) constructor above. Nor will the padding at the end. If you want to zero the whole thing, you must resort to something like the other constructor that uses memset to zero the complete memory region.

*
* *

Bit-fields are mostly used in storing information in a way that minimizes the amount of memory/storage used, without resorting to full-blown data compression techniques. For example, some of the image and data-compression information stored in the header of a GIF file are represented as bit-fields. This allows 2 or 3 fields to be encoded in a single byte.

They are also used to map to hardware-specific registers. For example, in the olden times of PC programming, the OS let you access directly all kind of hardware registers. For example, the PS/2 keyboard controller chip, the 8042, would expose its status register as a port (essentially a special type of memory location) with different bits reporting or configuring activity in the controller. The sane way of going about interfacing with the device would be to define a bit-field that maps to the register and access bits through the bit-field, and letting the compiler figuring out the best code to access the bits, rather than anding-oring-shifting ourselves.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: