The sorting algorithms we were taught in class were typically simplified versions of the algorithms that assumed that the data remained in memory and in contiguous memory locations, also known as an array. What we often have, however, are linked lists. Generalizing algorithms like the QuickSort to lists is not very hard in fact if we use a modification of the basic algorithm.

For lists with numerical keys (or values), there might be a simpler algorithm that scans the list in only one direction, so that simply linked lists are a perfectly cromulent data structure, that is, the radix sort.

The basic idea of the radix sort is extremely simple and best understood using the playing cards analogy. Say you have a normal deck of 52 cards that you have to sort in the French suite order—diamonds (), spades (♠), hearts () and clubs (♣), from ace (1) to King (13). A naïve approach would be so hold the unsorted cards in one hand and insert them one by one in the correct place with the cards being sorted, splayed out on the table.

A counter-intuitive but much more efficient way is to build thirteen stacks of cards, one for the aces, the deuces, up to the kings, and distribute the cards one by one from the unsorted hand onto the stacks. When this first pass is completed, the cards are still unsorted, but all the aces are together, so are the deuces, up to the kings. The second pass is to take the thirteen stacks so that the deuces are under the aces, the threes under the deuces, etc, with the kings completely under. You then form 4 stacks, one for each suit (, ♠, , ♣). Each time you draw a card, you send it to the appropriate stack. When you’re done, you just stack the four stacks together, and voilà! You have a sorted deck of cards.

Radix sort uses the same kind of techniques to sort a list of numerical values of known maximum bit width. However, instead of using suits and numerical card value, you would repeat a similar process with parts of the numerical key $K$ bits long, $k$ bits at a time.

Using $K$ bits, you necessarily need $2^k$ stacks. You will repeat the process $\lceil K/k \rceil$ times to form a completely sorted list.

For each key in the list, at iteration $i$, you take the $i$th bunch of $k$ bits from the key and send it to the corresponding stack in the $2^k$ stacks, the stack whose number is given by those $k$ bits considered as an integer. At iteration $i$, for a given key, the stack would be given something like (key >> (k*i)) & mask_k. the mask ensures that only the $k$ least significant bits are kept; mask_k=(1<<k)-1, $k$ 1 bits.

At the end of an iteration, you get $2^k$ stacks which you must re-stack into only one list. A naïve approach, say using STL containers, would ask for a lot of copies, allocation and de-allocation of list nodes. However, if you have an implementation where everything is accessible, you may notice that 1) even if you merge the stacks, the orders of items do not change, and 2) since it’s a linked list, why not link the top of the previous stack to the bottom of the next one? These two observations let us “sew” the stacks together as a single list, and the cost of the operation is not dependent on the number of items, but only in the number of stacks!

The two next figures illustrates the stacks and the stack “re-sewn”

The stacks as lists, after an iteration of distributing the items.

The stacks re-sewn together, without moving the nodes, now form a single list.

The algorithmic complexity of the radix sort depends on the numbers of items, $n$ and on the parameters $K$ and $k$. The number of iterations, $\lceil K/k \rceil$ does not depend on $n$ at all, and each iteration costs $O(n)$ operations since for each item, it suffice to compute the stack number and send it to that stack, and both operations can be computed in $O(1)$. The total complexity is therefore of $O(\frac{K}{k} n)$, which is linear in $n$. For $K$ bits key values, Radix sort is therefore faster than QuickSort which is $O(n \lg n)$.

Another interesting property of radix sort is that the lists are always scanned in the same direction, allowing simply linked lists (one with only a pointer to the next element). It may also be quite amenable to implementations where the data is laid out on many disks, as it was common in the olden days… and maybe quite necessary with monster data sets used in data mining applications.

*
* *

A bare-bone C99 example implementation would look something like that:

#include
#include
#include

// C99 Program (not ANSI C!)
//
// gcc -std=c99 -Wall -Wextra -O3

const int nb_numbers=1000; // any large number

////////////////////////////////////////
//
// a list node
//
struct list_node {
int number;
struct list_node * next;
};

////////////////////////////////////////
//
// structure that holds
// the head and tail of
// a list
//
struct list {
struct list_node * tail;
};

////////////////////////////////////////
//
// quite minimalistic append
//
void append( struct list * this_list, int number )
{
struct list_node * new_node = malloc( sizeof (struct list_node) );

new_node->number=number;
new_node->next=NULL;

if (this_list->tail)
{
this_list->tail->next = new_node;
this_list->tail = new_node;
}
else
}

////////////////////////////////////////
//
// appends a node to a list
void append_and_unlink_node( struct list * this_list,
struct list_node * this_node )
{
if (this_list->tail)
{
this_list->tail->next=this_node;
this_list->tail=this_node;
}
else

}

////////////////////////////////////////
//
// shows the content of a list
//
void show_list(struct list * this_list )
{
struct list_node * current = this_list->head;
int sorted=1;
int last=0;
while (current)
{
printf(” %d”,current->number);
if (last > current->number) sorted=0;
last=current->number;
current=current->next;
}
printf(“\n”);
if (sorted)
printf(“sorted.\n”);
}

////////////////////////////////////////
//
// list
//
//
void sort_list(struct list * this_list,
int base // in bits
)
{
int i,s;
return;

// do the iterations
//

// clears the buckets
for (b=0;bnext;

int num = current->number;
int bucket = (num >> s) & mask;

// necessary because here
// the node is unlinked from
// the main list
//
current=next;
}

// sews the buckets back together
// while checking for empty buckets
//
this_list->tail=buckets[b].tail;
}
else
// first non-empty bucket
//
{
this_list->tail=buckets[b].tail;
c=1;
}
}

}

// buckets pointers can be just
// dropped (no deallocation needed)
//
}

int main()
{
int i;
struct list this_list={NULL,NULL};

// fills the list with
// random numbers
//
srand(time(NULL));
RadixSort beats QuickSort, complexity-wise, whenever $\lceil\frac{K}{k}\rceil \leqslant \lg n$. $K$ is usually the number of bits of the machine-sized integer, something like 32 or 64, but ultimately depends on the key. Maybe the true value of $K$ is something like 24, or 19. Knowing $K$, one must solve for $k$ in order to have a sort that is faster than Quicksort. We know that $\lg n$ grows very slowly in $\lg n$ so we must make sure that $k$ is chosen properly. Using basic algebra, we find that $k \geqslant \frac{K}{\lg n}$, so $k$ can be chosen adaptively depending on both $K$ and $n$.
If $k=1$ and $K \approx \lg n$, the radix sort will behave more or less as a QuickSort, complexity-wise, yielding $O(n \frac{K}{k})=O(n \lg n)$ complexity. However, it would not suffer QuickSort’s $O(n^2)$ worst case complexity: it remains exactly always $O(\frac{K}{k}n)$.