#defines are EVIL

The C (and C++) preprocessor is a powerful but dangerous tool. For sure, it helps with a number of problems, from conditional code inclusion to explicit code generation, but it has a few problems. In fact, more than a few. It is evil.

evil(detail)

The C preprocessor (hereafter CPP) should be used with extreme care. For one thing, the CPP doesn’t know about the language it is applied on, it merely proceeds to the translation of the input using very simple rules, and this can leads to tons of hard to detect—and to fix—problems.

There are at least two problems I can think of that renders the use of macros as function dangerous and annoying. The first is that since the CPP is basically a big find-and-replace machine, it is not particularly smart on how macros are expanded. The second is that since the CPP is basically a big find-and-replace machine, it’s not particularly smart on when macros are expanded.

Macro used as function are declared like this:

#define function(arg1,arg2) ...stuff with arg1 and arg2...

(and the number of arguments can vary, but two is a good example) and are invoked:

y = function(x+3,4);

As would be a normal function most of the times. The problem is, they’re not functions, they’re merely textual substitution patterns. The above will textually replace the values for arg1 and arg2 in the macro body and paste it at invocation point. The expansion is so basic that in most case, you will need to use extra parentheses to make sure that operation precedence is respected. For example:

#define function(arg1,arg2) arg1 + arg2
...

x=function(x << 3, 5)

will result in bad behaviour: the operator<< has a lower priority than +, so you’ll end up with x << (3+5) rather than the expected (x<<3)+5. The correct way is to define:

#define function(arg1,arg2) ( (arg1) + (arg) )

So to force the correct precedence of evaluation in the final expression.

This is a somewhat simple case and any programmer that had his fingers bitten once by that kind of bug knows to put parentheses and that’s about it. However, if adding extra parentheses helps with operator precedence, it has another problem. Consider:

#define max(a,b)  ( (a) > (b) ? (a) : (b) )

...

x=max(a[i++],b[j++])

In this code, you cannot easily predict how many times i++ or j++ are executed. Inspecting the macro expansion, we see that the code is now:

x=( (a[i++])>(b[j++]) ? (a[i++]) : (b[j++]) )

Which isn’t the desired result at all! In short, macro used as functions are evil. Because macro expansion is dumb, arguments may be evaluated any number of times. Because macro expansion is dumb, arguments may be evaluated in any order and maybe their expressions will be merged with another expression to yield an unexpected result—as in the shift example above.

*
* *

The other major problem with the CPP is that it doesn’t understand scoping or namespaces. For example, the following code spews compilation errors:

...

#define max(a,b) ...
...
class A
 {
  private:

    int a,b;

    ...

  public:

  int max() const { ... } // clearly a member function

  ...
 }

Because max() looks like a function, the CPP tries to match it with a macro and the compiler complains that macro.cpp:19:12: error: macro "max" requires 3 arguments, but only 1 given (why one and not zero? I don’t know! That’s what G++ returns). Even a non-function macro can be worth a lot of problems:

#define shift 3


int function( int a, int b)
 {
  int shift=0; // clearly a local variable
  ...more code...
 }

 ...

 

This time, it complains with:

macro.cpp:28: error: expected unqualified-id before numeric constant

Which is somewhat cryptic.

The same happens with namespaces. Qualifying a symbol with an explicit namespace doesn’t stop the CPP to expand macros whenever there’s something that looks like a match:

#include <algorithm>
#include <iostream>

#define max(a,b,c) ...stuff...

// ...lot more code goes here...

int main()
 {
  int a=0;
  int b=3;

  std::cout << std::max(a,b) // clearly NOT the macro
            << std::min(a,b)
            << std::endl;
 }

The CPP replaces the macro max( ) but the compiler encounters errors:

macro.cpp:12:28: error: macro "max" requires 3 arguments, but only 2 given
macro.cpp: In function ‘int main()’:
macro.cpp:12: error: no match for ‘operator<<’ in ‘std::cout << std::max’
-- follows 20 more lines of error --

*
* *

The first solution is to avoid macro as much as possible. In C++, prefer template and inline functions to macros used as functions. The first advantage of using a true function is that the arguments are evaluated only once. For example, if the macro arguments contains code that has a side effect (like i++, for example) it is not easy to predict how many times it will be executed in the expanded statement.

Inline functions (available in C++ and C99, and in C as a compiler-specific extension) solve all problems of macro as functions. First, they ensure that the parameters are evaluated only once. Second, they offer the complete function semantics which aren’t all available with macros. For example, you can’t build a local scope and return a value with a macro in a clean way. Third, they are always syntactically safe, yet another thing that is not ensured by macros, especially when used in compound statements.

The fact that inline functions are functions and may require an actual function call if the compiler can’t inline the functions should really not stop you from using them. First, if the function is big enough so that the compiler can’t (or wont) inline it, you certainly don’t want it as a macro. Second, the time for a function call is dominated by the time it takes to evaluate the arguments, so it is eventually negligible.

*
* *

Since not all code bases seems to be aware of the problem inherent to the CPP, you may have to deal with stupid macro names—even include guards. Paradoxically, a macro named max is a lot more stupid than a macro named my_max_macro, as it more likely to interfere with user code than the latter. The defencive solution to this problem is to undefine macros known to cause problems:

#include <some-header>

#ifdef max
 #warning 'max' is defined as a macro. Undefining.
 #undef max
#endif

...more code...

Or you can simply #undef it quietly. I do prefer warnings, because it informs the programmer/maintainer that this was deliberate and that he should not expect the macro max to be available in this translation unit.

The proactive solution is to use smarter names for macros. You can of course use BIG_UGLY_CAPS for macros, but you can also use the underscore to specify that this is a compiler- or library-specific symbol, as suggested by the standards. The macro _max is already much better than just max, even though it may still interfere with some naming conventions. Note that double underscores are reserved recommended mandatory for compiler-specific extensions such as __attribute__ and the like. A macro named __max__ would imply that the macro is somehow special and compiler-specific.

*
* *

So, basically, the CPP is a good tool for testing the environment, check for defined macros, and for conditional compilation but a very, very, very bad tool for code generation. I can understand that it is tempting to use macros in C (and C99) as a weak substitute for meta-programming as there are really no facilities provided by the language.

In C++, however, we have all the tools needed: function and operator overloading, and powerful meta-programming through templates. The careful use of C++ meta-programming can lead to very efficient, compile-time optimized code.

10 Responses to #defines are EVIL

  1. Fanael says:

    Well, double underscores are actually reserved.

    • Steven Pigeon says:

      ISO/IEC 9899:1999 states, § 6.10.8 ¶ 4 :

      Any other predefined macro names shall begin with a leading underscore followed by an uppercase letter or a second underscore.

      Which states the rule in one direction (if predefined) then (has underscores), and not “if, and only if”. Elsewhere, § 7.1.3 ¶ 1 enumerates the uses of _ (followed by a capital letter) or __ : they can be used as part of an identifier, as long as they are used as would-be reserved symbols; the gist of which is explained in note 157, at the foot of p.181. The way I understand it is that a symbol that begins by _ or __ should be a reserved-looking symbol, at least in some compiler- or library-specific way. This includes, I would guess, include guards and library extensions such as _strdup.

      § 7.1.4 explains how to use and short-cut macros (namely the use of parenthesis around problematic symbols; in our example, we would have to write (max)(a,b) everywhere, which is still cumbersome).

      Annex J.2 states that undefining a macro beginning in _ or __ may result in undefined behavior (p. 510).

      Annex J.5.2 states (p.511), again, that compiler- or library-specific extensions (such as new keywords, pragma-looking directives, etc.) should begin by at least one underscore. Something similar is found in annex J.5.12 as well (p. 512)

      So my interpretation is that compiler- and library-specific macros must be implementation-specific looking by begining by _ or __; that undefining an implementation-specific looking symbol begining by _ or __ is risky and may result in implementation-specific behavior (i.e., “undefined behavior”). And, lastly, the rule is rather one way: you may define yourself implementation-specific looking symbols using _ or __; and if you do define such symbols, you MUST use _ or __.

  2. systemfault says:

    Even though we’re also using the CPP in C++, the rules are somewhat a bit different for names, see the following section of 14882:2003

    17.4.3.1.2 Global names [lib.global.names]
    1 -Certain sets of names and function signatures are always reserved to the implementation:
    — Each name that contains a double underscore (_ _) or begins with an underscore followed by an uppercase
    letter (2.11) is reserved to the implementation for any use.

    -Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

    Nice article, thanks :)

  3. Steven Pigeon says:

    There’s also ISO/IEC 14882:2003 § 17.4.3.1.3 ¶ 1 that’s new to me:

    Each name having two consecutive underscores (2.11) is reserved to the implementation for use as a name with both extern "C" and extern "C++" linkage.

    However, does it mean my__function or __my_function? both?

  4. Fanael says:

    Probably both, despite the fact that only second will be reserved in C. Yet another example of C++’s “compatibility” with C.

    Don’t blame me for my English, I’m Polish ;)

    • Steven Pigeon says:

      Oh, your English is quite good.

      Yes, the fact that C and C++ are somewhat compatible (and, contrary to what many people think, C++ is not a superset of C, C has new stuff that isn’t in C++) is still a (small) source of problems. g++, for example, has a number of extensions to accommodate C99 constructs that haven’t made their way into the C++ standard.

  5. […] the CPP Evil?. Yes. Worse, it still doesn’t solve the problem! Consider the following code […]

  6. […] are EVIL (part II) In a previous post I discussed some aspects of the C preprocessor (hereafter the CPP) that are evil. Turns out that […]

  7. […] as many things as it can at compile time. But, to the difference of the preprocessor (these evil #define statements), it will honor all of C++’s semantics. #defines are hardly anything more […]

  8. A Joker says:

    #define EVIL saintly

Leave a reply to #defines are EVIL (part II) « Harder, Better, Faster, Stronger Cancel reply