March 21, 2017
The idea of reorganizing data before compression isn’t new. Almost twenty five years ago, Burrows and Wheeler proposed a block sorting transform to reorder data to help compression. The idea here is to try to create repetitions that can be exploited by a second compression engine.

But the Burrows-Wheeler transform isn’t the only possible one. There are a few other techniques to generate (reversible) permutations of the input.

Read the rest of this entry »

Leave a Comment » | algorithms, C-plus-plus, data compression, hacks | Tagged: Burrows, Burrows Wheeler Transform, compression, hash table, Linear probing, Quadratic probing, Transform, Wheeler | Permalink

Posted by Steven Pigeon

November 17, 2015
In the previous entries, we learned that a good hash function for look-ups should disperse bits as much as possible as well as being unpredictable, that is, behave more or less like a pseudo-random number generator. We had a few failed attempts, a few promising ones, and now, a good one.

Read the rest of this entry »

Leave a Comment » | algorithms, bit twiddling, data structures | Tagged: hash, hash function, hash table, hashes for look-up | Permalink

Posted by Steven Pigeon

October 27, 2015
Last week’s hash functions—the check-sum, Knuth’s, and gray—produced less than optimal results. A function based only on addition, such as the check-sum, cannot possibly produce very large numbers, and therefore fails to distribute item over a very large table. That’s why there’s a confusion step following the combination step, to spread the bits around the (machine-sized) word.

So the confusion step must explicitly shuffle the bits around (although, not necessarily a permutation) and make sure that the most- and least-significant bits gets thrown around. Let’s try a couple of things!

Read the rest of this entry »

1 Comment | algorithms, data structures | Tagged: hash function, hash functions, hash table, hashes for look-up, hashing, rol, ror | Permalink

Posted by Steven Pigeon

October 20, 2015
Devising a good, fast, simple hash function isn’t easy. At all. It’s quite the challenge, in fact. Unsurprisingly, the first tries will be somewhat disappointing, but that shouldn’t keep you from trying. This week, let’s consider a couple of simple (and not very good) functions.

Read the rest of this entry »

1 Comment | algorithms, data structures | Tagged: Gray, hash, hash functions, hash table, hashes for look-up, Knuth | Permalink

Posted by Steven Pigeon

September 29, 2015
Hash tables are a great way of ensuring access to your data. Well, it does, but as long as the hash function is good.

But what exactly makes a hash function a good hash function?

Read the rest of this entry »

Leave a Comment » | algorithms, data structures | Tagged: hash, hash function, hash table, hashes for look-up, hashing, random, uniform random | Permalink

Posted by Steven Pigeon

September 17, 2013
Today, let’s talk about hash tables. Or, more precisely, one type of secondary probing technique, one that uses some number theory—but not quadratic residues, just relatively prime numbers.

I don’t know if you’re like me, but it often bugs me when I read something in a textbook that seems to make sense, but is claimed without proof nor demonstration (oh, I think I said that before). That seems to happen particularly often when I read stuff about data structures, and this time I decided to explore one of those claims.

Read the rest of this entry »

6 Comments | algorithms, Mathematics | Tagged: cargo cult, Coprime, golden ratio, hash table, Phi, prime, Prime number, relatively prime | Permalink

Posted by Steven Pigeon

October 13, 2009
Programmers aren’t always the very rational beings they please themselves to believe. Very often, we close our eyes and take decisions based on what we *think* we know, and based on what have been told by more or less reliable sources. Such as, for example, taking red-black trees rather than AVL trees because they are faster, while not being able to quite justify in the details why it must be so. Programming using this kind of decision I call *cargo cult programming*.

Originally, I wanted to talk about red-black *vs.* AVL trees and how they compare, but I’ll rather talk about the STL `std::map` that is implemented using red-black trees with G++ 4.2, and `std::unordered_map`, a hash-table based container introduced in TR1.

Read the rest of this entry »

3 Comments | algorithms, data structures, Design, hacks, Mathematics, programming, theoretical computer science | Tagged: algorithmics, assumptions, AVL tree, balancing tree, C, cargo, cargo cult, complexity, hash table, magic, magic thinking, red-black tree, Scrabble, Splay Tree, std::map, std::unordered_map, stl, TR1 | Permalink

Posted by Steven Pigeon