Initializing Arrays


Initializing large arrays of data before use is always cumbersome but it seems to be unavoidable.

The first types of solutions fill the array with a value that denotes an empty entry. While this sounds straightforward, the choice of that value is not always easy. For floating points numbers, zero may or may not be a good candidate. If convenient (as a zero floating point value is encoded as binary zeroes filling the variable) it may be difficult in some contexts to use because zero may be valid. Fortunately, there’s always the possibility to initialize the array with NaNs, which can be tested and used correctly (but you need the POSIX functions to do that).

Read the rest of this entry »



Two weeks ago, I went back to the Université de Montréal to pay a visit to my Ph. D. Adviser. As we toured his laboratories, I noticed that a drawing I scotch-taped to a wall ( as a Masters’ student) was still there after all those years.

…So it’s been there since 1994. Damn, I’m old!

Compressing Voxel Worlds


A friend of mine, Arthur, is developing a voxel-based game and found himself having to deal with large volumes of data. Unlike 2½D games where the map is essentially a 2D expanse with occasional relief, his game allows the definition of things in full (discrete) 3D.

To avoid loading the entire map in memory, he made the wise design decision of making his world tile-based, so that it can, simultaneously, reduce the working set as well as having an essentially open map, by loading only a small set of visible world blocks at all time. So together we took a look at compressing these world blocks, and it turned out that we can do a lot with fairly simple algorithms and VLCs.

Read the rest of this entry »

Wallpaper: Autumn


(Autumn, 1920×1200)



(A photographic experiment involving a kaleidoscope and some code)

Suggested Reading: Beautiful Evidence & The Visual Display of Quantitative Information


Edward R. Tufte — The Visual Display of Quantitative Information —2nd ed., Graphic Press, 2001, 197p. ISBN 978-0-9613921-4-7

Edward R. Tufte — Beautiful Evidence —2nd ed., Graphic Press, 2006, 213p. ISBN 978-0-9613921-7-8

(Buy at

(Buy at

Tufte is known for his work on visual presentation of information and all his know-how can be found in his two books: we discover his sober, well documented, to-the-point style, and the careful analysis of each graphs and display technique. His style is such that he conveys without ambiguity why he think something “works” or doesn’t.

Read the rest of this entry »

Suggested Reading: Now You See It


Stephen Few — Now You See It: Simple Visualization Techniques
for Quantitative Analysis
 — Analytic Press, 2009, 330p.
ISBN 978-0-9706019-8-8

(Buy at

Having to review papers (and other manuscripts) quite often, I can say that one of the greatest weaknesses (after dismal engrish and bad overall structure) is the display of quantitative information. In this book (which is somewhat redundant with Show Me the Numbers: Designing Tables and Graphs to Enlighten), Few presents us, under a definite business angle, information representation techniques. While Tufte is deeply concerned with the æstheticism of the graphics, Few is oriented toward its communicative power.

This book is « grand public » is contains no math, and discuss little about data analysis, despite being a trend curve here, a variance there, there again a box-plot; the level of the text remains accessible.

INT_MAX is a terrible NaN (Safer Integer Types, Part IV)


I came across a lovely bug lately. Integer arithmetic, especially in C and C++ it seems, is error-prone. In addition to the risk of having the wrong expressions altogether (a logic error, one could say), integer arithmetic is subject to a number of pitfalls, some I have already discussed here, here, and here. This week, I discuss yet another occasion for error using integer arithmetic.

Consider this piece of code, one that you have seen many times probably, at least as a variation on the theme:

Read the rest of this entry »

Linear Regression Revisited.


I always disliked when a book gives equations and formulas as if of fiat without providing justification or demonstration; and I don’t mind that they skip a few steps as long that I can fill in the blanks myself if I care to. Linear regression is one of those formula we see but we’d like to understand better.

So let us see how to derive the formulas for linear regression and see how they generalize to any number of unknowns.

Read the rest of this entry »



The game is different whether we consider external data structures—that live partially or totally on disk or over the network—or internal data structures residing entirely in local memory. For one thing, none of the courses I had on data structures I had really prepared me to deal adequately with (large) external data structures; it was always assumed that the computer’s memory was somehow spacious enough to hold whatever data you needed.

However, when you’re dealing with files, you can’t really assume that you can access random locations to read, write, or exchange data at low cost, and you have to rethink your algorithms (or change plans altogether) to take the cost of slow external media into account. Sometimes an algorithm that doesn’t look all that useful in main memory suddenly makes more sense in external memory because of the way it scans its data. One of these algorithms is the Shellsort.

Read the rest of this entry »