Smaller enums

As you may have noticed, efficient representation of information and data structure is kind of a hobby of mine. I often look at ways I can reduce the memory footprint, and, as often, it’s the small details that are the most annoying, like, for example, enums that use up pretty much anything the compiler feels like.

Indeed, the standard does not require that the compiler uses the smallest type, but merely one that can hold all values (§7.2.6, in ISO/IEC 14882:2011(E)), so you end up with something “convenient”, that is, int. Unless, of course, you do specify storage.

In C++11, enums got better in two important ways. First, you can define enum class T, which introduces a strongly-type set of enums values, instead of C-style smooshy-woolly-weakly kind-of-int enums. Second, you can specify an integral (int-like) type for storage. It’s then your responsibility to choose that integral type wisely. Well, in any case, big enough to accommodate all enumerated values.

So you can write something like:

enum class thingy:uint8_t { zoid, zigzig, zwoink };

and it will define a strong enum type thingy, and its storage will take only one byte. Except that in a class (or a struct), the compiler will likely align however it sees fit. So you may have 7 unused bytes following it if the next member is 8-byte aligned (say, a pointer). But OK, now we could still better use memory with a storage specifier.

But a byte still looks somewhat wasteful when we only have, like in the above example, just a few distinct values. For three values, we’d need only 2 bits (well, that gives us four combinations), and we should be able to pack that into a bitfield.

But bitfields only use integral values. Strong enum classes aren’t quite integral anymore, they’re enum classes! More, C++ doesn’t directly give us the means to enumerates the enums, nor to know its maximum and minimum enumerated value. Let’s see how we could somehow fix that.

To have something that allows us to find the maximum and minimum values of an enum, we can overload std::numeric_limits<T&ht; (yes, we’re allowed to, §, ISO/IEC 14882:2011(E)). A somewhat naïve implementation would be something like this:

namespace std
  template<> class numeric_limits<thingy>
       static constexpr thingy max() { return thingy::zwoink;}
       static constexpr thingy min() { return thingy::zoid; }
       static constexpr int digits() { return bits_needed((int)max()); }

To enumerate the values, which we don’t need right away, we could do something like this. But to know how many bits to use in our bitfield, we must at least know the largest value of the enum, then use this to compute the number of bits. Fortunately for us, it’s not that hard:

constexpr size_t bits_needed(size_t max_value)
  return (max_value<=2) ? 1 : 1+bits_needed((max_value+1)/2);

Where bits_needed(2) should be 1 (of course, bits_needed(1) should be zero, and bits_needed(1) is undefined, but since they’re not very useful case, we needn’t worry about those). Note that it’s not quite a log function, 8 returns 3 because 8 different values can be encoded on 3 bits, while 9 will return 4. The constexpr part will ensure that we can use this at compile-time.

Now, let’s come back to those bitfields. If we write

struct blorp
  // warning only! is too small to hold all values of ‘enum class thingy’
  thingy y: 2;

It will still work, but the compiler complains as the declared storage is uint8_t, as it would with the default storage (int?). However this code is brittle as any change to thingy may require you to revisit your code and change that 2 for something else. However, after having overloaded std::numeric_limits, we can write:

struct blorp
  thingy y: bits_needed((size_t)std::numeric_limits<thingy>::max());

which is now a bit more robust.

* *

However, there’s still a lot to do. If the enum isn’t compact (say you have something like enum class type { a=1,b=2,c=1231,d=123121 }), bits_needed will return the largest enum value, not the number of values; and the number of bits used in the bitfield will be computed from it. We could place a codec between the bitfield and the enum which would map the arbitrary enum values to 0,1,… in a compact way. This still may be insufficient, because it also happens that enums are used to define bit patterns to be combined.

2 Responses to Smaller enums

  1. h0nzZik says:

    Just a few minor details:
    1. In the last example, why do not use std::numeric_limits::digits(), if you have defined it?
    2. I think the compiler is right in emitting the warning, since according to section 7.2:8 in the C++14 standard, the set of values of an ‘enum class’ is a set of values of its underlying type.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s