INT_MAX is a terrible NaN (Safer Integer Types, Part IV)

I came across a lovely bug lately. Integer arithmetic, especially in C and C++ it seems, is error-prone. In addition to the risk of having the wrong expressions altogether (a logic error, one could say), integer arithmetic is subject to a number of pitfalls, some I have already discussed here, here, and here. This week, I discuss yet another occasion for error using integer arithmetic.

Consider this piece of code, one that you have seen many times probably, at least as a variation on the theme:

int cost=INT_MAX;

/* here goes code that may
   or mayn't assign cost */

int adjusted_cost=cost+extra_cost;
if (cost+extra_cost<best_cost)
    best_cost=adjusted_cost;

Can you see what’s the problem with this code (besides the fact that it may or mayn’t assign cost)? Think it over before continuing.

*
* *

The problem is that since cost is int and is initialized to INT_MAX adding anything (other than zero) to it will cause to overflow, and wrap around, which means that cost+extra_cost is now negative, and is therefore (very possibly) smaller than best_cost. The exact behavior is implementation-specific, but it will break the algorithm.

There are basically tree solutions to this kind of problem. The first is to test whether or not the value is INT_MAX. The condition becomes something like:

if ( (cost!=INT_MAX) && ( cost+extra_cost<best_cost))
 {
  ...

(You may have noticed that this version isn’t much safer. If cost is INT_MAX-1 and extra_cost is 10, then it’s still broken. If fact, it is complicated to get arbitrary expressions like this one to behave correctly in all cases, that is, regardless of the of the values involved.)

The second solution is to replace int by an explicitly larger type but to use the same constant:

#include <stdint.h>

...

int64_t cost=INT32_MAX;

Now cost+extra_cost cannot overflow/wrap-around, and the code now behaves the way it was intended, but at the cost of manipulating larger integers (which can have implementation-specific performance issues).

The third solution requires analyzing your algorithm and knowing what are the values you operate on. What is the largest possible value for cost for this algorithm? Maybe it’s 100, maybe it’s 1 000 000, or maybe it’s 200 000 000? Maybe with extra_cost always less than 1024 and cost at 1 000 000 will provide safe arithmetic in all cases. Adjusting the maximal cost to a reasonable maximum that is far from INT_MAX is a variation on the theme of using uint64_t instead of only int except that it uses int (chosen as to be a machine-efficient size) and it provides an occasion to understand and document your algorithm.

*
* *

The basic fact is that getting integer arithmetic exactly right all the time given two’s complement, modular arithmetic (and thus finite integer sizes) is really complicated. Not always, but I invite the reader to consider again the second solution above, or even the original expression. Can you write a test that checks both for overflow and test the condition correctly? How many special case will it have? What if I change int to unsigned? will it still work?

If both values are unsigned (short, int), testing for (a+b)<max(a,b) correctly detects overflows for all possible pairs of a and b. If the values are signed, it fails on many negative pairs of a and b, for example, -32768 + -17488 will yield 15280 (assuming 16 bits integers). One possible solution is to compute the carry; if there’s a carry, there’s an overflow. While the carry is readily available to the CPU, C doesn’t help much here; you’ll have to write your own:

int will_overflow(int16_t i,
                  int16_t j)
{
  int16_t s=i+j;
  int32_t s32=i+j;
  return s!=s32;
}

7 Responses to INT_MAX is a terrible NaN (Safer Integer Types, Part IV)

  1. Fanael says:

    Your will_overflow function doesn’t really work, it returns “yes, it will overflow” for something as simple as -32768 and 0. I think there was a formula for checking for overflow in Hacker’s Delight, I’ll take a look when I’ll finally find this book.

    • Steven Pigeon says:

      The original

        uint32_t u_i = (uint16_t)i;
        uint32_t u_j = (uint16_t)j;
        return u_i+u_j > 32767;
      

      is clearly defective, as you rightly point out. I will fix that to:

      int will_overflow(int16_t i,
                        int16_t j)
      {
       int16_t s=i+j;
       int32_t s32=i+j;
       return s!=s32;
      }
      

      Does what it should. I’ve been looking in Hacker’s Delight and what is presented there seems to depends on the sign of the operands (leaving two branches of tests.) It does have the merit of not necessitating bigger registers to perform its tests.

      • Fanael says:

        I found it:

        int will_overflow(int16_t i, int16_t j)
        {
          /* cast to unsigned in order to avoid undefined behavior (signed overflow) */
          int16_t s=(uint16_t)i+(uint16_t)j;
          return ((~(i^j))&(s^i))>>15;
        }
        

        No branches ;)

  2. petke says:

    You may find this interesting: http://channel9.msdn.com/Shows/Going+Deep/David-LeBlanc-Inside-SafeInt

    It’s about integer wrapper types that automatically detects any overflow or divide by zero. It looks very useful. I wish it was included in boost.

  3. jason says:

    I always avoided this issue by just assigning it to a large number instead of int_max. for example, if you know the value is unlikely to be more than 10,000,000 you can just put 9999999 or something.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: