New Block Order

March 21, 2017

The idea of reorganizing data before compression isn’t new. Almost twenty five years ago, Burrows and Wheeler proposed a block sorting transform to reorder data to help compression. The idea here is to try to create repetitions that can be exploited by a second compression engine.

But the Burrows-Wheeler transform isn’t the only possible one. There are a few other techniques to generate (reversible) permutations of the input.

Read the rest of this entry »


Much Ado About Nothing

March 7, 2017

A rather long time ago, I wrote a blog entry on branchless equivalents of simple functions such as sex, abs, min, max. The Sing EXtension instruction propagates the sign bit in the upper bits, and is typically used in the promotion of, say, a 16 bits signed value into a 32 bits variable.

But this time, I needed something a bit different: I only wanted the sign-extended part. Could I do much better than last time? Turns out, the compiler has a mind of its own.

Read the rest of this entry »


Whatever sums your floats

January 24, 2017

While flipping the pages of a “Win this interview” book—just being curious, not looking for a new job—I saw this seemingly simple question: how would you compute the sum of a series of floats contained in a array? The book proceeded with the simple, obvious answer. But… is it that obvious?

Read the rest of this entry »


Strings in C++ Switch/Case statements

January 10, 2017

Something that used to bug me—used to, because I am so accustomed to work around it that I rarely notice the problem—is that in neither C nor C++ you can use strings (const char * or std::string) in switch/case statement. Indeed, the switch/case statement works only on integral values (an enum, an integral type such as char and int, or an object type with implicit cast to an integral type). But strings aren’t of integral types!

In pure C, we’re pretty much done for. The C preprocessor is too weak to help us built compile-time expression out of strings (or, more exactly, const char *), and there’sn’t much else in the language to help us. However, things are a bit different in C++.

Read the rest of this entry »


Size(_t) matters!

December 27, 2016

Sometime last week, a tweet from @nixCraft prompted the question, quite ironically, how do you get the maximum (largest positive) value for an integer?

max_int

Read the rest of this entry »


Serializing Trees

September 6, 2016

Quite a while ago I discussed using flat arrays and address calculations to store a tree in a simple array. The trick wasn’t new (I think it’s due to Williams, 1964 [1]), but is only really practical when we consider heaps, or otherwise very well balanced trees. If you have a misshapen tree, that trick doesn’t help you. It doesn’t help you either if you try to serialize a misshapen tree to disk.

But what if we do want to serialize arbitrarily-shaped trees to disk? Is it painful? Fortunately, no! Let’s see how.

Read the rest of this entry »


Making a good random table

April 5, 2016

I am still experimenting with hash functions, and I was toying with the Zobrist hash function[1] that is best known for its use in chess engines. The hash function is conceptually simple: you need a large table of random numbers, indexed, in a chess application, by the position on the board of the piece and by the piece itself. To compute a hash for a whole board configuration, you simply xor all the random numbers together. The hard part is choosing the random numbers.

IMG_0405-small

Read the rest of this entry »