## just_enough

30/06/2020

The C99 <stdint.h> header provides a plethora of type definition for platform-independent safe code: int_fast16_t, for example, provides an integer that plays well with the machine but has at least 16 bits. The int_fastxx_t and int_leastxx_t defines doesn’t guarantee a tight fit, they provide an machine-efficient fit. They find the fastest type of integer for that machine that respects the constraint. But let’s take the problem the other way around: what about defines that gives you the smallest integer, not for the number of bits (because that’s trivial with intxx_t) but from the maximum value you need to represent?

07/03/2017

A rather long time ago, I wrote a blog entry on branchless equivalents of simple functions such as sex, abs, min, max. The Sing EXtension instruction propagates the sign bit in the upper bits, and is typically used in the promotion of, say, a 16 bits signed value into a 32 bits variable. But this time, I needed something a bit different: I only wanted the sign-extended part. Could I do much better than last time? Turns out, the compiler has a mind of its own.

## Whatever sums your floats

24/01/2017

While flipping the pages of a “Win this interview” book—just being curious, not looking for a new job—I saw this seemingly simple question: how would you compute the sum of a series of floats contained in a array? The book proceeded with the simple, obvious answer. But… is it that obvious?

## Size(_t) matters!

27/12/2016

Sometime last week, a tweet from @nixCraft prompted the question, quite ironically, how do you get the maximum (largest positive) value for an integer? 05/01/2016

Let’s make a detour through low-level programming this week. Let’s talk about bit-fields and some of their quirks. ## Enumerating 32 bits Floats

05/11/2013

This week, let’s go back to (low level) programming, with IEEE floats. To unit test a function of float, it does not sound unreasonable to just enumerate them all. But how do we do that efficiently? Clearly f++ will not get us there. Nor will the machine-epsilon (the std::numeric_limits::epsilon()) because this value works fine around 1, but as the value diverges from 1, the epsilon basically becomes useless. We would either need a magnitude-dependent epsilon (which the standard library does not provide) or a way of enumerating explicitly the floats in increasing or decreasing order (something also not provided by the standard library). Well, let’s see how we can do that

## Two Features I’d Like to See in C and C++

01/10/2013

This week, let’s discuss two features I’d really like to see in C and
C++; one trivial, one not so trivial. ## Shallow Constitude

19/03/2013

In programming languages, there are constructs that are of little pragmatic importance (that is, they do not really affect how code behaves or what code is generated by the compiler) but are of great “social” importance as they instruct the programmer as to what contract the code complies to. One of those constructs in C++ is the const (and other access modifiers) that explicitly states to the programmer that this function argument will be treated as read-only, and that it’s safe to pass your data to it, it won’t be modified. But is it all but security theater?

## Compressed Series (Part II)

12/03/2013

Last week we looked at an alternative series to compute $e$, and this week we will have a look at the computation of $e^x$. The usual series we learn in calculus textbook is given by $\displaystyle e^x=\sum_{n=0}^\infty \frac{x^n}{n!}$

We can factorize the expression as

## Float16

12/02/2013

The possible strategies for data compression fall into two main categories: lossless and lossy compression. Lossless compression means that you retrieve exactly what went in after compression, while lossy means that some information was destroyed to get better compression, meaning that you do not retrieve the original data, but only a reasonable reconstruction (for various definitions of “reasonable”). Destroying information is usually performed using transforms and quantization. Transforms map the original data onto a space were the unimportant variations are easily identified, and on which quantization can be applied without affecting the original signal too much. For quantization, the first approach is to simply reduce precision, somehow “rounding” the values onto a smaller set of admissible values. For decimal numbers, this operation is rounding (or truncating) to the $n$th digit (with $n$ smaller than the original precision). A much better approach is to minimize an explicit error function, choosing the smaller set of values in a way that minimizes the expected error (or maximum error, depending on how you formulate your problem).