Strings in C++ Switch/Case statements

Something that used to bug me—used to, because I am so accustomed to work around it that I rarely notice the problem—is that in neither C nor C++ you can use strings (const char * or std::string) in switch/case statement. Indeed, the switch/case statement works only on integral values (an enum, an integral type such as char and int, or an object type with implicit cast to an integral type). But strings aren’t of integral types!

In pure C, we’re pretty much done for. The C preprocessor is too weak to help us built compile-time expression out of strings (or, more exactly, const char *), and there’sn’t much else in the language to help us. However, things are a bit different in C++.

Since C++11, a function or a variable can be defined as constexpr, that is, guaranteed to have value entirely defined at compile-time. This allows the compiler to compute, at compile time, the values of expressions. (This, in turn, implies that the code “compiles away” and just leaves the result.) There are a number of requirements to make a function or an expression constexpr, but we can still use that to make switch/case work on strings (or const char *).

The switch/case statement itself will still require an integral operand, so we must transform a string into an integral value. The usual way of doing this is to use a hash function. Can we built a constexpr hash function?

Well, yes, obviously—otherwise I wouldn’t have bothered to write that much text. One limitation of constexpr functions is that they pretty much have to limit themselves to declarations, directives, and a return statement. A good way of using these limitations is to think functional programming, where recursion is the main control mechanism. To hash a string (or a const char *) we must scan every character at least once. This suggest a hash function such as:

uint64_t constexpr mix(char m, uint64_t s)
 {
  return ((s<<7) + ~(s>>3)) + ~m;
 }

uint64_t constexpr hash(const char * m)
 {
  return (*m) ? mix(*m,hash(m+1)) : 0;
 }

…where mix is the real hash function. I’m sure you can come up with something better, but as long as it is good enough, it’ll work just fine. So how do we use that in a switch/case? Most simply:

void switch_test(const char * str)
 {
  switch( hash(str) ) // run-time
   {
    case hash("tatatututoto"): // compile-time

     std::cout << "tutu!" << std::endl;
     break;

    case hash("tatatititoto"): // compile-time
     
     std::cout << "titi!" << std::endl;
     break;
     
    default:
     
     std::cout << "wut?" << std::endl;
     break;
   };
 }

int main()
 {

  switch_test("tatatititoto");
  switch_test("tatatututoto");
  switch_test("abababubabub");
  return 0;
 }

This indeed produces the desired output:

titi!
tutu!
wut?

*
* *

The main problem with this approach is the possibility of hash collisions. The above hash function isn’t very strong, very obviously, but can be dealt with by the compiler. If a collision occurs, it will be detected at compile-time and the compiler will issue a warning or an error about “duplicate case value”. In this case, the only correct workaround is to eliminate collisions by changing the hash function. Anyway, I’m sure you can do better than the mix hash function above.

9 Responses to Strings in C++ Switch/Case statements

  1. Luc says:

    We’ve found out that this kind of compile-time evaluations have a noticeable impact on compilation times.

    • That’s an objection that is often brought up when we’re dealing with templates. Personnally, unless it degenerates to ridiculously slow compilation, I consider it a non-issue: it’s run-time I’m most interested in.

  2. Marshall Clow says:

    You are counting on the idea that the hash of a string does not change from one run of your program to the next. In C++, that is not guaranteed: [hash.requirements]/2 “The value returned shall depend only on the argument k for the duration of the program.”

    • That’s true of std::hash (since c++14). Here, we have a static hash that is guaranteed to hash to the same value, everytime, even between runs. As for the switch itself, the labels are compile-time, so they can’t change between runs, so the compiler has to generate the right code once.

  3. Vishal Oza says:

    I agree that strings should be put in the switch statement I would suggest suggesting writing the proposal to the standard committee this would need to be both a library change (to support std::string directly) and a language change for the switch string statement.

  4. Marshall Clow says:

    LLVM has a “StringSwitch” class that implements the equivalent of a switch statement for constant strings. I haven’t looked into it to see how it is implemented, though.

  5. Jerry Coffin says:

    For what it’s worth, a few years ago, some of us on Stack Overflow implemented this general idea, but using DJ Bernstein’s hash function.

    http://stackoverflow.com/q/19120326/179910

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: