Safer Integer Types (part III)

A few days ago (again at the time of writing, but since I accumulate and schedule posts for a weekly release, that may already mean a few months ago) a friend was a bit nonplussed by the fact that an expression such as:

int x;
unsigned int y;

if (x+y<0)
 {
  ...
 }
&#91;/sourcecode&#93;<br><br>

was simply <em>never</em> true. At first, you're thinking "how can this be?" because you're trying to find values of x and y that, summed together, are obviously negative. That is, without counting on the surprising integer promotion/integral conversion system of C and C++.<br><br>

Let us see what's going on exactly.<br><br>
<span id="more-1074"></span>

To solve this problem, we have basically 3 important sections of the standard to look-up. In the case of C++ (ISO/IEC 14822:2003), these sections are <em>4.5 Integral Promotions</em>, <em>4.7 Integral Conversion</em> and, lastly, <em>5 Expressions</em>.<br><br>

The first section explains how integral types (that is, that can hold only integer values) are converted between each other.<br><br>

<blockquote> 
<b>4.5 Integral promotions</b><br><br>

An rvalue of type char, signed char, unsigned char, short int, or unsigned short int can be converted to an rvalue of type int if int can represent all the values of the source type; otherwise, the source rvalue can be converted to an rvalue of type unsigned int.<br><br>

An rvalue of type wchar_t (3.9.1) or an enumeration type (7.2) can be converted to an rvalue of the first of the following types that can represent all the values of its underlying type: int, unsigned int, long, or unsigned long.<br><br>
 
An rvalue for an integral bit-field (9.6) can be converted to an rvalue of type int if int can represent all the values of the bit-field; otherwise, it can be converted to unsigned int if unsigned int can rep resent all the values of the bit-field. If the bit-field is larger yet, no integral promotion applies to it. If the bit-field has an enumerated type, it is treated as any other value of that type for promotion purposes.<br><br>

An rvalue of type bool can be converted to an rvalue of type int, with false becoming zero and true becoming one.<br><br>

These conversions are called <i>integral promotions</i>.
</blockquote>

The second section we're interested in explain how values are converted when assigning from one type to another.<br><br>

<blockquote>
<b>4.7 Integral conversions</b><br>

An rvalue of an integer type can be converted to an rvalue of another integer type. An rvalue of an enumeration type can be converted to an rvalue of an integer type.<br><br>

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2<sup>n</sup> where n is the number of bits used to represent the unsigned type). [Note: In a two's complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). ]<br><br>

If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined. If the destination type is bool, see 4.12. If the source type is bool, the value false is converted to zero and the value true is converted to one.<br><br>

The conversions allowed as integral promotions are excluded from the set of integral conversions.<br><br>
</blockquote>

Finally, from section 5, <b>Expressions</b>, &para;&nbsp;9

<blockquote>
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:<br><br>

<ul>
  <li>If either operand is of type long double, the other shall be converted to long double.</li><br>
  <li>Otherwise, if either operand is double, the other shall be converted to double.</li><br>
  <li>Otherwise, if either operand is float, the other shall be converted to float.</li><br>
  <li>Otherwise, the integral promotions (4.5) shall be performed on both operands.</li><br>
  <li>Then, if either operand is unsigned long the other shall be converted to unsigned long.</li><br>
  <li>Otherwise, if one operand is a long int and the other unsigned int, then if a long int can rep- resent all the values of an unsigned int, the unsigned int shall be converted to a long int; otherwise both operands shall be converted to unsigned long int.</li><br>
  <li>Otherwise, if either operand is long, the other shall be converted to long.</li><br>
  <li>Otherwise, if either operand is unsigned, the other shall be converted to unsigned.</li><br>
  [Note: otherwise, the only remaining case is that both operands are int ]
</ul>
</blockquote>

So, what is going on in the case of <tt>x+y</tt> we had earlier? Well, clearly, we have the rule that if <em>either</em> operands is <tt>unsigned</tt>, the other is also converted to <tt>unsigned</tt>, and there is no more mystery. So, how can we express the intended idea given the rules of integer promotion, conversion, and expressions? since the <tt>int</tt> is converted minimally (it still uses the same bit pattern), we can't set up a (simple) test that involves conversion that would work in all cases.<br><br>

Because even if we converted, say, <tt>y</tt> from <tt>unsigned</tt> to <tt>long</tt> or even <tt>long long int</tt>, it doesn't mean that <tt>long</tt> or <tt>long long int</tt> is any larger than <tt>int</tt>, bringing us back to the case where the <tt>unsigned</tt> is converted to <tt>int</tt> wreaking havoc again.<br><br>

I think that in this case, the only <em>safe</em> way of performing this test is to use a two step test:<br><br>


if ((x<0) && (-x>y)) 
 {
  ...
 }

If x is negative, it is converted to a positive value (then converted to unsigned) and compared to y. Even in the case where x=-2147483648 for 32 bits arithmetic, the above still works properly. In this case, x is indeed smaller than zero, but the conversion -x “fails” because the value is still -2147483648, but as -2147483648 can be represented as 0x8000000, the result x+y will be “negative” in two’s complement arithmetic.

It is readily verified:

for (int x=-32768;x<32768;x++) for (int y=0;y<65536;y++) { int sx=x; unsigned uy=y; bool a = (x+y<0); // no promotions here bool b = (x<0) && (-x>y);

if (a!=b)
cerr << "oh noes!! x=" << x << " y=" << y << endl; } [/sourcecode] terminates without printing oh noes!!.

*
* *

The problem doesn’t lie in the (apparently weird) C/C++ promotion and conversion system, but in modular arithmetic itself. Two’s complement is a great invention because it dispenses us from having different operations for addition and subtraction: subtraction is reduced to the addition of the modular complement of the number. But the modular nature of addition even renders the testing of the solution very hard because overflows are simply ignored and the result is ‘wrapped around’. The problem we had here is similar; there are some cases you just cannot test properly. Consider the following: a+b gives a result that is smaller than, say, b. Is it because a is negative? Everyday arithmetic says yes, but modular arithmetic says that maybe a+b was larger than 2n-1 and so it wrapped around to a (apparently) random value. In assembly language, you can trap wrap around by inspecting the carry flag, but you can’t do that in C or C++. You just have to be a lot more careful.

3 Responses to Safer Integer Types (part III)

  1. Tom says:

    static_cast is your friend. Arithmetic expression mixing signed and unsigned types is a maintenance nightmare, because there may be no indication as to whether the implicit promotion is a bug, or the intended behavior. Adding an explicit (and greppable) cast solves that quite nicely, IMHO.

    • Steven Pigeon says:

      static_cast helps you somewhat in C++ to make the intent explicit, but alas! does not exist in C. Would you suggest to use something like a macro to emulate (at least for the intent) static_cast ?

  2. […] arithmetic is subject to a number of pitfalls, some I have already discussed here, here, and here. This week, I discuss yet another occasion for error using integer […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: