In one of the classes I teach, we end up writing assembly language programs. And while I explain the (sometimes very relative) benefits of writing assembly language, I use bubble sort as an example where even carefully crafted assembly language doesn’t mean much: it’s a bad algorithm to start with.


Except that it’s not quite true.

Read the rest of this entry »

Selection, Revisited.


When we think of searching, we generally think of searching a value in a sorted collection of some sort. For a simple array, this implies the array is sorted and that we use binary search or a variant. But what if we want to search by rank? In a sorted array, that’s not very hard: the kth item is in slot k. But what if the array is not sorted?

Read the rest of this entry »

the Dutch Flag Problem


While preparing my lecture notes on sorting, I rediscovered the Dutch flag problem proposed by Edsger W. Dijkstra quite a while ago. This problem is relevant in the context of sorting, especially for variants of Quicksort, where you want to create not two but three partitions.


Like many problems, the Dutch flag problem has a very simple statement. Say you have an array with three types of value, how can you arrange them so that all the items of the first type is at the beginning of the array, the items of the third at the end (and, of course, leaving the second type between the two)?

Read the rest of this entry »



The game is different whether we consider external data structures—that live partially or totally on disk or over the network—or internal data structures residing entirely in local memory. For one thing, none of the courses I had on data structures I had really prepared me to deal adequately with (large) external data structures; it was always assumed that the computer’s memory was somehow spacious enough to hold whatever data you needed.

However, when you’re dealing with files, you can’t really assume that you can access random locations to read, write, or exchange data at low cost, and you have to rethink your algorithms (or change plans altogether) to take the cost of slow external media into account. Sometimes an algorithm that doesn’t look all that useful in main memory suddenly makes more sense in external memory because of the way it scans its data. One of these algorithms is the Shellsort.

Read the rest of this entry »

Checksums (part I)


I once worked in a company specializing in embedded electronics for industrial applications. In one particular project, the device communicated through a RS-422 cable to the computer and seemed to return weird data once in a while, causing unwanted behavior in the control computer whose programming did not provide for this unexpected data. So I took upon myself to test the communication channel as it seemed that the on-board software was operating properly and did not contain serious bugs. I added a check-sum to the data packet and it turned out that some packets came in indeed corrupted despite the supposedly superior electrical characteristics of the RS-422 link.

After a few days’ work, I implemented the communication protocol that could detect and repair certain errors while reverting to a request to retransmit if the data was too damaged. I then started gathering statistics on error rate, number of retransmit, etc, and the spurious behavior on the controller’s side went away. My (metaphorically) pointy-haired boss opposed the modification because “we didn’t have any of these damn transmission errors until you put your fancy code in there“. Of course, this was an epic facepalm moment. I tried to explain that the errors have always been there, except that now they’re caught and repaired. Needless to say, it ended badly.


Notwithstanding this absurd episode, I kept using check-sum to validate data whenever no other layer of the protocol took care of transmission errors. So, this week, let us discuss check-sums and other error detection algorithms.

Read the rest of this entry »

The 10 (classes of) Algorithms Every Programmer Must Know About


In Tunnels of Doom!, I wrote that the disjoint sets algorithm is one of the very few algorithms every programmer should know. That got me thinking. Should? What about must? If everyone must know about disjoint sets, what other algorithms must every programmer know about?

I made a “top ten” list of algorithms and data structures every programmer must know about.

Read the rest of this entry »