Safer Integer Constants (Part II)

In C and C++, the behavior of integer promotions is not the same whether we use signed integers or unsigned integers. In bit twiddling hacks, it is not immediately apparent that it is a problem that may lead to unexpected result whenever code is ported to a different architecture. This is a recurring topic whenever I discuss portability with friends. Very often, they argue that if it works on Windows and 32 bits x86 Linux, that’s pretty much all there is to it.

Of course, I couldn’t disagree more. I have learned the hard way.

Windows have adopted Intel‘s Newspeak-like terminology for integer sizes. A BYTE is what everyone expects. a WORD is 16 bits, a DWORD (double word) is 32 bits, a QWORD is 64 bits, and a DQWORD is a double quad word, which is, I suppose, double plus good as well as being 128 bits. So specifying data structures using those #defines is already bad, but assuming implicitly that (unsigned) int and pointers are always DWORDs is crimethink.

The C99 standard provides the <stdint.h> and from <stddef.h> headers, which provide non-ambiguous definitions of integer types, such as uint32_t. Using those while declaring variables is only the first step. As we saw in an earlier post, extra care must be applied to get the right values at initialization. But some extra care must be given to arithmetic expressions as well.


char x;

long z = x << 24; [/sourcecode] Assume that char is 8 bits and that long is indeed 32 bits. What can go very wrong with the above code?

well, the expression x << 24 promotes x to int, not to long. If int is of the same length as long, you won’t detect the error, because run-time behavior is what you expect for that code. If, however, you port this code to an architecture where int is only 16 bits, you’ll get 0. Always. With luck, your compiler complains:

safer.c:7: warning: left shift count >= width of type

If you’re out of luck, your compiler just compiles without warnings, and you may search for that bug quite a while. The same happens with expression involving constants. For example, the following code is always correct, regardless of the size of int:

unsigned long upper_mask = -1 << 5; [/sourcecode] The expression -1 << 5 is int, which is successively promoted to long int, then unsigned long int yielding a correct mask of 0xff...ffe0, regardless of the size of unsigned long. The value is correctly initialized because signed have their sign bit propagated on promotion. Had we written:

uint16_t x = -1; // 0xffff

uint64_t mask = x << 5; [/sourcecode] we would get 0x1fffe0, which is a bit shorter than expected!

So, if you’re writing for portability across platforms (something I used to do quite more often in a previous life as I did embedded programming using processors like the Hitachi H8S and Rabbit Semiconductor‘s z80-ish CPU) you have to pay extreme attention to these details. Even worse, you may not even rely on the fact that the arithmetic is in two’s complement, because it is specified by the standard that it could also be one’s complement, or sign-magnitude (ISO/IEC 14882:2003, 3.9.1 § 7)—although I know of no computers that do not use two’s complement.

2 Responses to Safer Integer Constants (Part II)

  1. […] for atoll, etc. The first problem is that, as we discussed in a series of previous posts (here, and here) the size of int, long, etc., vary from platform to platform, so it is not clear how convenient […]

  2. […] say), integer arithmetic is subject to a number of pitfalls, some I have already discussed here, here, and here. This week, I discuss yet another occasion for error using integer […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: