Selection, Revisited.

When we think of searching, we generally think of searching a value in a sorted collection of some sort. For a simple array, this implies the array is sorted and that we use binary search or a variant. But what if we want to search by rank? In a sorted array, that’s not very hard: the kth item is in slot k. But what if the array is not sorted?

An algorithm does just that: selection. We have discussed the selection algorithm a long time ago, here, but let’s recapitulate. The selection works a bit like QuickSort: it chooses a value somewhere in the array to act as a pivot, then partition the array in two parts: one containing values smaller or equal to the pivot, and one containing values greater or equal to the pivot. The two parts aren’t necessarily equal, but they meet in some point, say x. If k is smaller than x, then the slot k is to its left; otherwise it’s to its right. We pick the appropriate sub-array and repeat the procedure. Eventually, we arrive at a sub-array of length 1, at position k. The value it contains is the kth element.

The procedure doesn’t sort the array. It sorts just enough of it to establish what value is in slot k. But repeated calls to select will make it progressively more sorted:

selection1

selection2

selection3

selection4

selection5

selection6

This has the side effect that, depending how we chose the pivot (say in the middle of the array), we are less and less likely to fall into the algorithm’s worst case of choosing a pivot such that we end with 1 item on one side and n-1 on the other. This leads to a O(n^2) behavior, which is bad. But it we do are that unlucky, the array will be sorted after the first call to selection.

*
* *

If we keep track of all pivots, we know where the blocks formed by the successive searches are. Can we exploit this to help us search by value rather than by rank? Well, yes, somewhat. That only gives us information on the bounding boxes, and not really about the values contained. Of course, we know that all the values within a block are greater than the preceding block and smaller than the next, but we need to scan the block to know about the value they contain.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: