In the previous post of this series, we left off where we were asking ourselves if there was a better way than the `selection` algorithm of finding the median.

Computing the median of three numbers is a simple as sorting the three numbers (an operation that can be done in constant time, after all, if comparing and swapping are constant time) and picking the middle. However, if the objects compared are “heavy”, comparing and (especially) moving them around may be expensive.

One possibility, is to use comparison, but *no* swapping, to find the median. Basically, for three numbers , , , we want to know which one of the six following arrangements

,

,

,

,

,

or

holds.

The goal is therefore to devise a testing procedure that determines, in an optimal number of steps, which one of the above six arrangements corresponds to the actual values of , , and . We can proceed by elimination. Testing any two pairs, say , will split the above six arrangements in two groups; one in which holds (with three arrangements) and one in which it does not (also with three arrangements). Each group can be further divided by asking another question, say, , which also creates two sub-groups, and so on, until all groups have been broken down to exactly one arrangement. In C++, that would look pretty much like:

template <typename T> const T & median3( const T & a, const T & b, const T & c ) { if (a<b) // {a,b,c}, {a,c,b}, {c,a,b} if (b<c) return b; // {a,b,c} else // {a,c,b}, {c,a,b} if (a<c) return c; // {a,c,b} else return a; // {c,a,b} else // {b,a,c}, {b,c,a}, {c,b,a} if (b<c) // {b,a,c}, {b,c,a} if (a<c) return a; // {b,a,c} else return c; // {b,c,a} else return b; // {c,b,a} }

You understood that the method works (it will ask essentially questions for numbers, and it does not move object around), but also that the resulting if-tree grows rapidly in the number of items to sort. That is, good luck if you have to find the median of 25 numbers.

Moreover, this method is branch-intensive: it will branch randomly, and will be difficult to predict by the branch predictor, therefore (probably) harming performance quite a bit.

If we allow swaps (and consider them inexpensive), we can do something using sorting networks.

A *sorting network* can be understood as a train-yard sorting station metaphor. You have lanes, and along the lanes, you have a certain number of exchanging junctions onto other lanes. That is, if a train on a lane switches to lane , then the train on lane switches to lane .

The goal is to figure out the most efficient network (with the smallest number of switches) for a given number of lanes. Let us see what sorting networks look like.

For or , there isn’t much to do: with , you just compare the two values and swap them if one is larger than the other (that is, , if you want increasing order). In C++, the basic operation is therefore given by:

template <typename T> void cmpexchg(T & a, T & b) { using std::swap; // ADL trick if (a>b) swap(a,b); }

and the graph representing the network is denoted as:

Where lanes are joined by lines, and circle/dots indicate that the lanes are connected through the line (this is somewhat of an electrical engineering notation, but the large dots, although redundant, make it easier to understand the graph). At , we have a network such as:

(where the color lines locate ‘sequence points’ in the network which determine which compares can be done in parallel) which would yield the code:

template <typename T> T median3_network( T a, T b, T c ) { cmpexchg(a,c); cmpexchg(a,b); cmpexchg(b,c); return b; }

One of the possible networks for is given by:

and finally, for , one of the possible solutions looks like:

…which is already rather more complex than with .

The advantage with a sorting network is that, in theory, a great number of comparison/exchange can be made in parallel (as they are mutually independent) and thus collapse the depth of the network significantly. Assuming that we can do up to comparison in parallel, the depth of such a network is (but with size ).

The disadvantage is that, in general, it is *hard* (in the co-NP-complete sense) to determine whether or not a given network is optimal for the size of the problem (or, conversely, it is NP-Complete to arrive at an optimal network)!

*

* *

So sorting networks are great if you can have parallel compare/exchange operations, but when we examine the primitive `cmpexchg(T & a, T & b)`, we see they’re not fundamentally more efficient than the if-tree based solution. If one thing, they’re much worst if the cost of swapping is high.

*

* *

Can we use a primitive such as `median3` to get the median of a much larger number of values than just three? What is a `median3` of three `median3`?

*To Be Continued…*

I usually find Branch-less variants to be more efficient, even when it means extra operations. Branch mis-predictions tend to hurt performance quite badly.

I suppose we could do something like

or we could bundle tests using

pmax*and pmin* mmx/sse instructions (not sure what level they are).[…] in the two previous parts of this series, we have looked at the selection algorithm and at sorting networks for […]