New Block Order

March 21, 2017

The idea of reorganizing data before compression isn’t new. Almost twenty five years ago, Burrows and Wheeler proposed a block sorting transform to reorder data to help compression. The idea here is to try to create repetitions that can be exploited by a second compression engine.

But the Burrows-Wheeler transform isn’t the only possible one. There are a few other techniques to generate (reversible) permutations of the input.

Read the rest of this entry »


…And a Good One (Hash functions, part VI)

November 17, 2015

In the previous entries, we learned that a good hash function for look-ups should disperse bits as much as possible as well as being unpredictable, that is, behave more or less like a pseudo-random number generator. We had a few failed attempts, a few promising ones, and now, a good one.

Read the rest of this entry »


Three (somewhat) Better Functions (Hash functions, part V)

October 27, 2015

Last week’s hash functions—the check-sum, Knuth’s, and gray—produced less than optimal results. A function based only on addition, such as the check-sum, cannot possibly produce very large numbers, and therefore fails to distribute item over a very large table. That’s why there’s a confusion step following the combination step, to spread the bits around the (machine-sized) word.

So the confusion step must explicitly shuffle the bits around (although, not necessarily a permutation) and make sure that the most- and least-significant bits gets thrown around. Let’s try a couple of things!

Read the rest of this entry »


Three bad functions (Hash functions, part IV)

October 20, 2015

Devising a good, fast, simple hash function isn’t easy. At all. It’s quite the challenge, in fact. Unsurprisingly, the first tries will be somewhat disappointing, but that shouldn’t keep you from trying. This week, let’s consider a couple of simple (and not very good) functions.

Read the rest of this entry »


Hash Functions (Part I)

September 29, 2015

Hash tables are a great way of ensuring O(1) access to your data. Well, it does, but as long as the hash function is good.

7e4156dfac4d82e9a5cab4987ecc3a15

But what exactly makes a hash function a good hash function?

Read the rest of this entry »


Hash Table and Magic (Relatively Primes) Numbers

September 17, 2013

Today, let’s talk about hash tables. Or, more precisely, one type of secondary probing technique, one that uses some number theory—but not quadratic residues, just relatively prime numbers.

200px-Golden_spiral_in_rectangles.svg

I don’t know if you’re like me, but it often bugs me when I read something in a textbook that seems to make sense, but is claimed without proof nor demonstration (oh, I think I said that before). That seems to happen particularly often when I read stuff about data structures, and this time I decided to explore one of those claims.

Read the rest of this entry »


Cargo Cult Programming (part 1)

October 13, 2009

Programmers aren’t always the very rational beings they please themselves to believe. Very often, we close our eyes and take decisions based on what we think we know, and based on what have been told by more or less reliable sources. Such as, for example, taking red-black trees rather than AVL trees because they are faster, while not being able to quite justify in the details why it must be so. Programming using this kind of decision I call cargo cult programming.

cargo

Originally, I wanted to talk about red-black vs. AVL trees and how they compare, but I’ll rather talk about the STL std::map that is implemented using red-black trees with G++ 4.2, and std::unordered_map, a hash-table based container introduced in TR1.

Read the rest of this entry »