Safer Integer Constants

Consider the following short C (or C++) program:

const int thingy=123123123123;

Depending on your compiler, the above code may succeed, fail, produce a warning, or be accepted quietly and result in undefined behavior, which is extremely bad.

How can you prevent failure of the above code? According to ISO/IEC 14882:2003, the standard document describing C++, section 2.1.3.1 § 2, a constant is successively compared to integer types until one that fits is found. If a constant is written in decimal and devoid of a suffix, the compiler tries to represent it as int. If int is insufficient to contain the value, long is tried. If long cannot hold the constant, undefined behavior results. That is, the compiler is free to do whatever it feels like. This gets better. If the constant is written in hexadecimal, int, unsigned, long and unsigned long are successively tried. Again, if none of those type can hold the constant, it results in undefined behavior.

If the constant is written in decimal and suffixed by u, the compiler understands it is an unsigned constant of some sort, so it repeats the same with unsigned and unsigned long, again resulting in undefined behavior if the constant is too large for either.

If the variable is suffixed with l, the constant is at least a long, but the compiler may compare with unsigned long‘s range if the constant exceeds long. Undefined behavior results if the constant exceeds unsigned long.

It is only if the constant is suffixed with ll that the compiler considers the type long long, which usually corresponds to a type that is twice as large as long, or not: it is implementation-specific behavior.

Using gcc 4.2.4 (not the latest version, but the version I have on my AMD64 box), I get the warning:

suffixes.c:7: warning: large integer implicitly truncated to unsigned type

and the program prints -1430928461 which is not the value wanted or expected.

So, what went wrong exactly? The above code seems to be expecting int to be larger than 32 bits, and that’s where it fails. In C (and so in C++), contrary to other languages such as Java, the integer types are machine-specific, sizes of which are dictated by considerations such as the compilation model (see a previous entry on the topic) and the underlying microprocessor. The type int is usually mapped onto an efficient, machine-default, data type. In x86 mode, int should be 32 bits long while long (and long long) are larger (or equal) to int. The LP32 model provides 64 bits integers only with long long. The LP64 model provides long and long long as 64 bits integers.

How do we make absolutely sure that we get exactly what we expect? We use the platform-safe basic types definitions from C99’s <stdint.h> and from <stddef.h>. We rewrite the program as:

uint64_t thingie=123123123123;

Yet, this code is still not safe. Remember the type determination rules we just enumerated? They do not guaranty that long long is considered before assignation. You may very well get the same warning and see a truncated value assigned to your constant thingie, because unsigned long so happens to be too short, as only long long is large enough to hold the constant. But, there is hope. We can use ll as a suffix, yielding:

uint64_t thingie=123123123123ll;

Yet, this code is still not safe! Because, again, ll behavior is implementation-specific. So you probably need a macro—which I advocate against whenever I can, normally—to wrap the constant so that the correct suffix is generated. Fortunately, such macros are provided by the standard. Browsing the stddint.h file, we find:

/* The ISO C99 standard specifies 
   that in C++ implementations these
   macros should only be defined 
   if explicitly requested.  */
#if !defined __cplusplus || defined __STDC_CONSTANT_MACROS

/* Signed.  */
# define INT8_C(c)	c
# define INT16_C(c)	c
# define INT32_C(c)	c
# if __WORDSIZE == 64
#  define INT64_C(c)	c ## L
# else
#  define INT64_C(c)	c ## LL
# endif

...more...

So we rewrite yet again the code as:

#define __STDC_LIMIT_MACROS // could be from Makefile as well
#include <stdint.h> // even in C++

...

uint64_t thingie = UINT64_C(123123123123);

Yay! Type-safe code!

Ok, now, what about float and double?!

Let’s look at those next week.

6 Responses to Safer Integer Constants

  1. Tom says:

    Minor nitpick – you should use UINT64_C, not __UINT64_C. The former is standard, the latter is probably GNU-specific.

    I just had a run-in with this recently, with a non-C++ developer. I think it needs to be stressed that for constant values, the compiler does not consider usage. Hence, whether you say “double foo = 123123123123” or “const char[] bar = 123123123123”, the literal constant is always evaluated the same.

  2. Steven Pigeon says:

    You’re right about __UINT64_C. My bad. Corrected.

    As for constant being evaluated the same, I’m not sure what you mean. In the second example, the compiler should complain that you’re making a pointer from an integer, or something similar.

    Indeed, it spews

    error: invalid conversion from ‘int’ to ‘const char*’

  3. Tom says:

    My point is that it said “invalid conversion from ‘int'”… as opposed to “invalid conversion from ‘long long'”. The truncation occurs before the assignment, and that can trip up naive developers.

  4. If I use 123123123123, it does generate

    error: invalid conversion from ‘long int’ to ‘const char*’

    (because I’m in LP64, I only have long rather than long long, but same difference)

  5. […] long long for atoll, etc. The first problem is that, as we discussed in a series of previous posts (here, and here) the size of int, long, etc., vary from platform to platform, so it is not clear how […]

  6. […] could say), integer arithmetic is subject to a number of pitfalls, some I have already discussed here, here, and here. This week, I discuss yet another occasion for error using integer […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: