Sorting Lists (part II)

Last week I showed you the radix sort on simple linked lists. This week, I will present a version of QuickSort modified to sort simply linked lists.

The traditional partition algorithm has two cursors (or pointers) that start at both ends of the list and travel towards each other, swapping elements as needed, until the two meet somewhere near the middle of the list. First, a pivot element is chosen, either the first element of the list, or the one in the middle, if easily accessible. Some algorithms (like the one I present here) estimates a pivot so that on average, it will be near the value of the element that should be in the middle of the list if it was sorted. The left pointer advances while the element it points to is smaller or equal to the pivot value. The right pointer backs while the element it points to is larger than the pivot. When both pointers stop advancing, they both points to elements that are on the wrong side of the partition, so the elements are swapped. Swapping the elements like this makes sure that the element on the left side is again smaller or equal to the pivot and that the element on the right side is also again larger than the pivot. We repeat this until the pointers meet somewhere around the center of the list.

Now that the pointers met in the middle of the list, we have two sub-lists. The left one is filled with the elements smaller or equal to the pivot while the right one is filled with the elements larger than the pivot. Note that neither are sorted just yet. To make sure the list is eventually sorted, QuickSort is reapplied, recursively, to both sides of the list, until one has to sort a “degenerate” list of 1 element. When QuickSort reaches a degenerate list, it backtracks. Eventually, all numbers are sorted.

This algorithm could be implemented using a doubly linked list and rather complicated pointer calisthenics to exchange nodes without moving the data nor using memory management1. However, the partition algorithm can be modified while maintaining the general structure of the algorithm.

The partition algorithm can be transformed to scan the list of elements from left to right only quite easily. As before, a pivot value is chosen. There are lists, the left and the right list that are initialized empty. A pointer, say we call it the current pointer, scans the original lists from left to right. If the element it points to is smaller or equal to the pivot, it is transferred to the left list, otherwise it is transferred to the right list. When the current pointer is done scanning the original list, the partition is complete.

The code for the QuickSort, using last week’s framework, would be given by:

////////////////////////////////////////
//
// This version of QuickSort is considerably
// different from the (introductory) textbook
// version
// 
// This version is stable (in the sorting 
// sense of stable)
//
void quick_sort( struct list * this_list )
 {
  struct list left_list = {NULL,NULL};
  struct list right_list = {NULL,NULL};
  struct list_node * current = this_list->;head;

  // check for degenerate cases
  if (this_list->head==this_list->tail) 
   return;

  // hack! source : http://aggregate.org/MAGIC/#Average%20of%20Integers
  //
  // computes the average of two numbers
  // in overflow-safe way
  //
  int x = this_list->head->number;
  int y = this_list->tail->number;
  int pivot = (x&y)+((x^y)>>1);

  while (current)
   {
    struct list_node * next = current->next;

    if (current->number<pivot)
     append_and_unlink_node( &left_list, current );
    else
     append_and_unlink_node( &right_list, current );

    current=next;
   }

  // pivoting may have failed us
  // by picking a pivot <= (but not next;
    append_and_unlink_node( &left_list, right_list.head);
    right_list.head=next;
   }

  if (right_list.head==NULL && left_list.head)
   {
    struct list_node * next = left_list.head->next;
    append_and_unlink_node( &right_list, left_list.head);
    left_list.head=next;
   }

  if (left_list.head )  quick_sort( &left_list );
  if (right_list.head ) quick_sort( &right_list );

  // let's resew the lists
  left_list.tail->next = right_list.head;
  this_list->head=(left_list.head)  ? left_list.head : right_list.head;
  this_list->tail=(right_list.head) ? right_list.tail : left_list.tail;
 }

Note that after the partition, we have two special cases, due to the predictive pivot selection. It may happen that one of the two lists is empty while the other one remains with the same number of elements, leading to endless recursion (that’s bad). If we had chosen the pivot as the first element of the list and removed it from it and had a three way recursion combining the left list, the pivot, and the right list. That would be an option. However, when the list starts to be sorted (or almost so), the risk that the first element is a bad pivot increases significantly.

*
* *

Now, let us compare the speed of this version of the QuickSort with last week‘s Radix sort. Last week, the code example for radix sort used a 4 bit radix leading to 16 stacks. This parameter can be varied and the numbers of stacks changed. This parameter k controls the number of times one iterates over the list. If the keys are K bits long, then the number of iterations is \lceil \frac{K}{k} \rceil. For K=32 and k=4, that means that there are 8 iteration performed. For k=5, we get 32 stacks but only 7 iterations. For k=8, we get 4 iterations. The costs of sewing the stacks together is comparatively modest when n, the number of items, is somewhat large. Taking the sewing into account, we arrive at a complexity of O( \frac{K}{k}(n+2^k) ), where the 2^k term accounts for the sewing of the 2^k stacks if you use ak bits radix. For k=8, it means that this term is 256, which might be negligible asymptotically as n grows. Already 256 is dwarfed by n=10^6.

QuickSort does not really offer similar trade-offs in performance. If the pivot is really badly chosen at each partition, regardless of the partition algorithm that uses the pivot afterwards, it may lead to O(n^2) run-time. If you would be using an array, it might be feasible to randomize the keys so that the average and best-case performance of O(n \lg n) is met. However, you cannot randomize a simply linked list very easily.

To compare the speed of the two algorithms, we will use three tools.

The first is a high resolution timer such as the one provided by gettimeofday, which is accurate to the microsecond. On Windows, there appears to be TimeGetTime that is accurate to milliseconds, which is more of a limitation.

The second is a pseudo-random number generator capable of generating large quantity of random data. However, it should be clear that the pseudo-random number generator should be tweaked so that it emits numbers distributed in the same way your target data actually is. In this test I use the uniform number generator provided by the infamous rand function from the C standard library. In a more targeted test, I might replace the generator by a Zipf generator, or maybe a geometric random variable. This is of course very specific to what you’re interested in sorting.

The third tool is a spreadsheet, or maybe a a more elaborate system such as Mathematica, to analyse the timing resulting from the experiments. One could plot the times to sort against the number of trials to get a graph such as:

Trials for the Radix 8

Trials for the Radix 8

A plot like that may give you some information but it fails in two very important ways. The first is that it looks like a time series while in fact it shows a collection of independent trials. The second is that the peak on the left is shown disproportionally large. Already changing the axis gives a much better result in showing that the variation is not all that significant:

Trials for the Radix 8, rescaled

Trials for the Radix 8, rescaled

Maybe the glitch is due to the computer doing something else while the test was running, we’re talking about a difference of 50ms.

Second, this type of graphics does not convey any precise information on the average behavior of the trials. It looks like a time series, and it’s difficult to guess where the minimum and maximums are (excluding the peak), and what’s the variance, things that might help you reflect on the expected behavior of the algorithm you’re testing.

A very good tool to do that is the box plot. The box plot (or box-and-whiskers, or similar variant) makes clear at least seven important informations about your data: the minimum, the least 5%, the lower quartile, the median, the upper quartile, the least 95%, and the maximum. A series of trial is then collapsed into this box-and-whisker, and you can see the median and the box that represents 50% of the trials. The bars extend to cover 90% of the trials. This, taken from wikipedia, explains the situation very well:

Box Plots explained relative to PDF

Box Plots explained relative to PDF

So, using box plots may give you a lot more information about comparing series of trials than mere time-series-like graphs.

To perform the experiments, I used the uniform random number generator, unconstrained, that is, keys are between 0 and RAND_MAX, and successive series of 1000, 10000, 100000, and 1000000 numbers. Which each algorithm and each data set, the experiment was performed 100 times.

I performed tests with k=4, 5, and 8 for the radix sort. In the graphs, RS4 10000 refers to n=10000 and k=4. QS is QuickSort. After a few minutes, we get the following:

Run-times in the QuickSort vs Radix Sort experiment

Run-times in the QuickSort vs Radix Sort experiment

Which can be differentiated more clearly using a log-scale:

Run-time in the Quick Sort vs Radix Sort experiment, log scale

Run-time in the Quick Sort vs Radix Sort experiment, log scale

I do not show the results for n=1000 because at that scale, they’re all basically zero (while this is not true in absolute terms). We see that the difference in run-time are not very significant when n is small. At n=10000, QuickSort and radix sort compare, with RS8 much faster. At n=1000000, however, radix sort performs much better than QuickSort, even more so that the radix k is large. At k=8, the radix sort is about 3 times as fast as QuickSort and exhibit a much tighter box plot.

Not only radix sort is faster than QuickSort, it is much less susceptible to bad behavior. QuickSort may revert to O(n^2) complexity if the pivot is badly chosen. Radix sort, on the other hand, doesn’t care much about the actual contents of the list and will perform similarly well regardless of what happens.

*
* *

How can radix sort be faster than QuickSort, as I heard that QuickSort has the best possible complexity—O(n \lg n)—for a sorting algorithm?

That’s because that’s only true for a comparison sort, where elements are compared to each other in other to be finally sorted. However, radix sort is not a comparison sort, it is an address transformation sort, and is therefore not constrained by the O(n \lg n) complexity lower bound.


1 In fact I wrote something in DDJ quite a while ago on the subject. Also, due to a different implementation, I also arrived at quite different conclusions than those I present in this post.

2 Responses to Sorting Lists (part II)

  1. […] Sorting Lists (part II) « Harder, Better, Faster, Stronger […]

  2. […] Of course, I am kind of familiar with sorting in general, and I’ve done a number of things with radix sort before. I’ve done a paper on this in Dr. Dobb’s Journal quite a while ago and revisited the results and techniques recently here and here […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: