<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Harder, Better, Faster, Stronger</title>
	<atom:link href="http://hbfs.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://hbfs.wordpress.com</link>
	<description>Explorations in better, faster, stronger code.</description>
	<lastBuildDate>Fri, 27 Jan 2012 14:45:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='hbfs.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Harder, Better, Faster, Stronger</title>
		<link>http://hbfs.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://hbfs.wordpress.com/osd.xml" title="Harder, Better, Faster, Stronger" />
	<atom:link rel='hub' href='http://hbfs.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Medians (Part III)</title>
		<link>http://hbfs.wordpress.com/2012/01/24/medians-part-iii/</link>
		<comments>http://hbfs.wordpress.com/2012/01/24/medians-part-iii/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 15:31:21 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[C-plus-plus]]></category>
		<category><![CDATA[data structures]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[heap]]></category>
		<category><![CDATA[max-heap]]></category>
		<category><![CDATA[med-heap]]></category>
		<category><![CDATA[median]]></category>
		<category><![CDATA[min-heap]]></category>
		<category><![CDATA[selection]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3826</guid>
		<description><![CDATA[So in the two previous parts of this series, we have looked at the selection algorithm and at sorting networks for determining efficiently the (sample) median of a series of values. In this last installment of the series, I consider an efficient (but approximate) algorithm based on heaps to compute the median. A heap is [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3826&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>So in the <a href="" target="_blank">two</a> <a href="" target="_blank">previous</a> parts of this series, we have looked at the selection algorithm and at sorting networks for determining efficiently the (sample) median of a series of values.</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg"><img src="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg?w=450" alt="" title="split-rock-small"   class="aligncenter size-full wp-image-3812" /></a></p>
<p>In this last installment of the series, I consider an efficient (but approximate) algorithm based on heaps to compute the median.</p>
<p><span id="more-3826"></span></p>
<p>A <a href="http://en.wikipedia.org/wiki/Heap_%28data_structure%29" target="_blank">heap</a> is an efficient tree-like data structure used to maintain, for example, <a href="http://en.wikipedia.org/wiki/Priority_queue" target="_blank">priority queues</a>. The basic invariant of a max-heap (where we are interested in knowing the largest value; there&#8217;s also a min-heap where we want to know the smallest value) is that, unless a leaf, an internal node contain a key that is larger than both its children&#8217;s keys. If this invariant is respected through all of the heap, then the root contains the maximum value contained in the heap (and, respectively for a min-heap, the minimum value). A max-heap would look something like this:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/max-heap.png"><img src="http://hbfs.files.wordpress.com/2011/12/max-heap.png?w=300&#038;h=146" alt="" title="max-heap" width="300" height="146" class="aligncenter size-medium wp-image-3829" /></a></p>
<p>The best part is that you can make any array a heap in linear time. But, how does that help us for the median? Well, we could think of a med-heap, where, at every node (that is not a leaf), the invariant is that the node has a key that is the median amongst itself and its two children! A (very) basic med-heap class would look something like:</p>
<p><pre class="brush: cpp;">
template &lt;typename T&gt;
void cmpexchg(T &amp; a, T &amp; b) 
 { 
  using std::swap; // ADL safe
  if (a&gt;b) swap(a,b); 
 }

template &lt;typename T&gt;
const T &amp; median3( T &amp; a, T &amp; b, T &amp; c )
 {
  cmpexchg(a,c);
  cmpexchg(a,b);
  cmpexchg(b,c);
  return b;
 }

template &lt;typename T&gt;
class med_heap
 {
 private:

  std::vector&lt;T&gt; &amp; heap;

  int left_child(int i) { return 2*i+1; }
  int right_child(int i) { return 2*i+2; }
 
  void heapify()
   {
    for (int current=heap.size()/2-1;current&gt;-1;current--)
     {
      int left=left_child(current);
      int right=right_child(current);

      // check if it has two children
      // (otherwise give up)
      //
      if ( (left &lt; heap.size()) &amp;&amp; 
           (right &lt; heap.size()))
       {
        T &amp; a = heap[current];
        T &amp; b = heap[left];
        T &amp; c = heap[right];

        const T &amp; med = median3(a,b,c);

        if (b==med)
         std::swap(a,b);
        else
         if (c==med)
          std::swap(a,c);
         else
          ; // a is already the median
       }
      else
       ; // has only one child, so
         // already &quot;median&quot;
     }
   }

 public:

  size_t size() const { return heap.size(); }

  // lets you peek at the next
  const T &amp; median() const
   {
    return heap[0];
   }


  med_heap(std::vector&lt;T&gt; &amp; v)
   : heap(v)
   {
    heapify();
   }
 };
</pre></p>
<p>Note that the class does not copy the vector, it references an already existing one (this not only avoids computing the time for allocation and copy, it is also fair to <tt>select</tt>, as it also only uses a reference to a vector).</p>
<p>So here the magic happens in <tt>heapify()</tt>. Using the addressing described <a href="http://hbfs.wordpress.com/2009/04/07/compact-tree-storage/" target="_blank">in a previous post</a>, it becomes simple to scan the array from the middle backwards to the beginning and enforce at each step the invariant. This takes &#8220;small&#8221; linear time, because at each entry, we only need to examine three values.</p>
<p>The problem is that, the median of median is not the median of the whole data, and that there will be imprecision in exchange of speed. That may be a good trade-off, as we will see.<br /><Br></p>
<p align="center">*<br />*&emsp;*</p>
<p>OK, the real contenders so far are <tt>select</tt> and the med-heap. We will compare to the <tt>stl::sort</tt> algorithm since is a <em>bona fide</em> comparison as 1) simple to use and 2) a likely solution for someone not wanting to reinvent the wheel (or not knowing about selection). The following shows times to find the median in an array of 1000 entries (with 1000 instances of the problem, the same for all three methods), with values on 0&#8230;9999:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/times.png"><img src="http://hbfs.files.wordpress.com/2011/12/times.png?w=300&#038;h=252" alt="" title="times" width="300" height="252" class="aligncenter size-medium wp-image-3830" /></a></p>
<p>From the <a href="http://en.wikipedia.org/wiki/Box_plot" target="_blank">box plots</a>, we see that the <tt>stl::sort</tt> fares worse (in fact a lot worse) than the two other alternatives. The med-heap is also significantly faster than <tt>select</tt>, 2-3&times; faster.</p>
<p>What about accuracy? Looking at the distribution of the errors, we see that 50% of the times, the relative error is within ±5%, and 95% of the time it is less than ±20%. This is distribution seems to be rather indifferent to the range of the values, for example, with a range of 0&#8230;255, you get more or less the same results.</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/errors.png"><img src="http://hbfs.files.wordpress.com/2011/12/errors.png?w=152&#038;h=300" alt="" title="errors" width="152" height="300" class="aligncenter size-medium wp-image-3831" /></a></p>
<p align="center">*<br />*&emsp;*</p>
<p>In essence, therefore, you trade off accuracy (±5% error 50% of the time) for a rather interesting speed-up (2-3&times;) over an exact algorithm such as <tt>select</tt>, which is an interesting results on its own. Now, whether or not a med-heap is adequate for your needs is another story. You could argue that for some applications, it is necessary to have the exact algorithm, and you could make the case where, for another application, the ±5% error 50% of the time is unimportant or unnoticeable.</p>
<p align="center">*<br />*&emsp;*</p>
<p>Here, we only have a rather sketchy implementation of a med-heap that provides only the fun part as far as knowing the median is concerned, but we could just as easily as with a min- or max-heap, provide the necessary functions to pop the median, insert, and remove, values from the med-heap. In fact, it would be exactly the same code as with a min- or max-heap, but for the median instead of min or max.</p>
<p align="center">*<br />*&emsp;*</p>
<p>Full Test Code <a href="http://www.stevenpigeon.org/blogs/hbfs/med-heap.cpp" target="_blank">here</a>.</p>
<p align="center">*<br />*&emsp;*</p>
<p>This is the 300th post.</p>
<p align="center">*<br />*&emsp;*</p>
<p>The STL function <tt>nth_element</tt> seems to be doing a good job at select (and a better one than me), but is still much slower than the med-heap:</p>
<p><a href="http://hbfs.files.wordpress.com/2012/01/times-with-nth.png"><img src="http://hbfs.files.wordpress.com/2012/01/times-with-nth.png?w=300&#038;h=252" alt="" title="times-with-nth" width="300" height="252" class="aligncenter size-medium wp-image-3938" /></a></p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/algorithms/'>algorithms</a>, <a href='http://hbfs.wordpress.com/category/c/'>C</a>, <a href='http://hbfs.wordpress.com/category/c-plus-plus/'>C-plus-plus</a>, <a href='http://hbfs.wordpress.com/category/data-structures/'>data structures</a>, <a href='http://hbfs.wordpress.com/category/hacks/'>hacks</a>, <a href='http://hbfs.wordpress.com/category/programming/'>programming</a> Tagged: <a href='http://hbfs.wordpress.com/tag/heap/'>heap</a>, <a href='http://hbfs.wordpress.com/tag/max-heap/'>max-heap</a>, <a href='http://hbfs.wordpress.com/tag/med-heap/'>med-heap</a>, <a href='http://hbfs.wordpress.com/tag/median/'>median</a>, <a href='http://hbfs.wordpress.com/tag/min-heap/'>min-heap</a>, <a href='http://hbfs.wordpress.com/tag/selection/'>selection</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3826/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3826&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/24/medians-part-iii/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg" medium="image">
			<media:title type="html">split-rock-small</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/max-heap.png?w=300" medium="image">
			<media:title type="html">max-heap</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/times.png?w=300" medium="image">
			<media:title type="html">times</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/errors.png?w=152" medium="image">
			<media:title type="html">errors</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/times-with-nth.png?w=300" medium="image">
			<media:title type="html">times-with-nth</media:title>
		</media:content>
	</item>
		<item>
		<title>Wallpaper: Frontières imaginées</title>
		<link>http://hbfs.wordpress.com/2012/01/21/wallpaper-frontieres-imaginees/</link>
		<comments>http://hbfs.wordpress.com/2012/01/21/wallpaper-frontieres-imaginees/#comments</comments>
		<pubDate>Sun, 22 Jan 2012 03:06:04 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[Wallpapers]]></category>
		<category><![CDATA[Zen]]></category>
		<category><![CDATA[wallpaper]]></category>
		<category><![CDATA[wallpapers]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3931</guid>
		<description><![CDATA[You can find more wallpapers here Filed under: Wallpapers, Zen Tagged: wallpaper, wallpapers<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3931&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div id="attachment_3932" class="wp-caption aligncenter" style="width: 310px"><a href="http://hbfs.files.wordpress.com/2012/01/0105.jpg"><img src="http://hbfs.files.wordpress.com/2012/01/0105.jpg?w=300&#038;h=187" alt="" title="0105" width="300" height="187" class="size-medium wp-image-3932" /></a><p class="wp-caption-text">(Frontières imaginées, 1920×1200)</p></div>
<p>You can find more wallpapers <a href="http://www.stevenpigeon.org/Wallpapers/index.html" target="_blank">here</a></p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/wallpapers/'>Wallpapers</a>, <a href='http://hbfs.wordpress.com/category/zen/'>Zen</a> Tagged: <a href='http://hbfs.wordpress.com/tag/wallpaper/'>wallpaper</a>, <a href='http://hbfs.wordpress.com/tag/wallpapers-2/'>wallpapers</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3931/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3931&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/21/wallpaper-frontieres-imaginees/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/0105.jpg?w=300" medium="image">
			<media:title type="html">0105</media:title>
		</media:content>
	</item>
		<item>
		<title>Wallpaper: Sibérie minimaliste</title>
		<link>http://hbfs.wordpress.com/2012/01/21/wallpaper-siberie-minimaliste/</link>
		<comments>http://hbfs.wordpress.com/2012/01/21/wallpaper-siberie-minimaliste/#comments</comments>
		<pubDate>Sun, 22 Jan 2012 03:03:56 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[Wallpapers]]></category>
		<category><![CDATA[Zen]]></category>
		<category><![CDATA[wallpaper]]></category>
		<category><![CDATA[wallpapers]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3925</guid>
		<description><![CDATA[You can find more wallpapers here Filed under: Wallpapers, Zen Tagged: wallpaper, wallpapers<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3925&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div id="attachment_3926" class="wp-caption aligncenter" style="width: 310px"><a href="http://hbfs.files.wordpress.com/2012/01/0106.jpg"><img src="http://hbfs.files.wordpress.com/2012/01/0106.jpg?w=300&#038;h=187" alt="" title="0106" width="300" height="187" class="size-medium wp-image-3926" /></a><p class="wp-caption-text">(Sibérie Minimaliste, 1920×1200)</p></div>
<p>You can find more wallpapers <a href="http://www.stevenpigeon.org/Wallpapers/index.html" target="_blank">here</a></p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/wallpapers/'>Wallpapers</a>, <a href='http://hbfs.wordpress.com/category/zen/'>Zen</a> Tagged: <a href='http://hbfs.wordpress.com/tag/wallpaper/'>wallpaper</a>, <a href='http://hbfs.wordpress.com/tag/wallpapers-2/'>wallpapers</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3925/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3925&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/21/wallpaper-siberie-minimaliste/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/0106.jpg?w=300" medium="image">
			<media:title type="html">0106</media:title>
		</media:content>
	</item>
		<item>
		<title>Lossless Coding of CD Audio</title>
		<link>http://hbfs.wordpress.com/2012/01/17/lossless-coding-of-cd-audio/</link>
		<comments>http://hbfs.wordpress.com/2012/01/17/lossless-coding-of-cd-audio/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 15:18:57 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[Bash (Shell)]]></category>
		<category><![CDATA[data compression]]></category>
		<category><![CDATA[embedded programming]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[CD]]></category>
		<category><![CDATA[Entropy]]></category>
		<category><![CDATA[Golomb Codes]]></category>
		<category><![CDATA[GSM]]></category>
		<category><![CDATA[Linear Prediction]]></category>
		<category><![CDATA[music]]></category>
		<category><![CDATA[psychoacoustic model]]></category>
		<category><![CDATA[psychoacoustics]]></category>
		<category><![CDATA[sound]]></category>
		<category><![CDATA[Speech]]></category>
		<category><![CDATA[Speech Codec]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3872</guid>
		<description><![CDATA[Once upon a time, I discussed how to pick bit-rate for MP3, while considering re-ripping all my CDs. But if I&#8217;m to re-rip everything, I might as well do it one last time and use lossless compression. In this post, we&#8217;ll discuss the simple script I cooked up to do just that, and a bit [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3872&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Once upon a time, I discussed how to <a href="http://hbfs.wordpress.com/2010/04/06/picking-a-bit-rate-for-mp3-files/" target="_blank">pick bit-rate</a> for MP3, while considering re-ripping all my CDs. But if I&#8217;m to re-rip everything, I might as well do it <em>one last time</em> and use <a href="http://en.wikipedia.org/wiki/Lossless_data_compression" target="_blank">lossless compression</a>.</p>
<p><a href="http://hbfs.files.wordpress.com/2010/02/gramophone_19142.png"><img src="http://hbfs.files.wordpress.com/2010/02/gramophone_19142.png?w=121&#038;h=150" alt="" title="Gramophone_1914" width="121" height="150" class="aligncenter size-thumbnail wp-image-2165" /></a></p>
<p>In this post, we&#8217;ll discuss the simple script I cooked up to do just that, and a bit on how <a href="http://en.wikipedia.org/wiki/FLAC" target="_blank">Flac</a> works, and how it compares to MP3.</p>
<p><span id="more-3872"></span></p>
<p>First, to run the script, you will need <tt>cdparanoia</tt> and <tt>flac</tt>, both packages readily available from Debian/Ubuntu repositories (and, for the MP3 version of the script, found at the end of this post, you will need <tt>lame</tt>, also found in the default repositories).</p>
<p>The first thing to do is to probe the CD for its Table Of Contents (TOC hereafter). Fortunately, <tt>cdparanoia</tt> is a fine piece of software and it detects the default/loaded CD/DVD drive. It suffice to invoke it with the probe option like this</p>
<p><pre class="brush: bash;">
echo probing...

if cdparanoia -Q 2&gt; cd-rip-out.tmp~
then
...
</pre></p>
<p>To find the drive and fetch the TOC, here written to a temporary file. Redirection from <tt>stderr</tt> is needed because <tt>cdparanoia</tt> outputs to <tt>stderr</tt> (for some reason). This temporary file is parsed to show the tracks and get their numbers:</p>
<p><pre class="brush: bash;">
grep -e '^[[:blank:]]*[0-9]*\.' cd-rip-out.tmp~ &gt; cd-rip-toc.tmp~
tracks=$( cut -d. -f 1 cd-rip-toc.tmp~ )
</pre></p>
<p>(The exact <a href="http://en.wikipedia.org/wiki/Poutine" target="_blank">poutine</a> is <tt>cd-paranoia</tt>-specific), but in short, it gets the track number and forms a list such as <tt>1 2 3 4</tt>. The user is also, later on, able to either chose the default list or specify his own, including <tt>cdparanoia</tt>-specific syntax such as <tt>12-15</tt> to grab tracks 12 to 15, inclusive, as a single track. We also add a couple of cosmetic details such as promoting track numbers from <tt>1</tt> to <tt>001</tt> (so that they all line up all pretty), ask for the track title, and invoke <tt>flac</tt>:</p>
<p><pre class="brush: bash; wrap-lines: false;">
    # main grab loop
    for track in ${track_list[@]}
    do
        ((t++))
        # sed hack from Vincent &quot;gnuvince&quot; Foley
        beautiful_track=$(echo $track \
        | sed 's/\([0-9]*\)/000\1/g;s/[0-9]*\([0-9]\{3\}\)/\1/g')

        read -p &quot;tracks $beautiful_track ($t/$t_l) title: &quot; this_track_title 
        this_track_name=&quot;$artist -- $album - $beautiful_track - $this_track_title&quot;

        if cdparanoia --never-skip=2 $track -w &quot;$this_track_name.wav&quot;
        then
            # grab successful!

            flac --best \
                &quot;$this_track_name.wav&quot; \
                --tag=artist=&quot;$artist&quot; \
                --tag=album=&quot;$album&quot; \
                --tag=track=&quot;$beautiful_track&quot; \
                --tag=title=&quot;$beautiful_track $this_track_title&quot; \
                --tag=year=&quot;$year&quot;
        else
            echo An error occurred while grabbing $track
        fi
    done
</pre></p>
<p>(The ugly sed hack to convert tracks number from 1 to 001 is due to <a href="http://gnuvince.wordpress.com/" target="_blank">gnuvince</a>).</p>
<p>Flac provides many options, but it seems that <tt>--best</tt> does indeed provide very good parameters, choosing the best compression settings regardless of CPU time (and, let us be serious, it&#8217;s rather unimportant that it takes &#8220;more&#8221; CPU time when it takes about 30s to compress an entire CD on a modern CPU&mdash;once the sound is ripped from the CD).</p>
<p align="center">*<br />*&emsp;*</p>
<p>Flac encodes sound losslessly, that is, if you decompress a <tt>.flac</tt> file, you get, bit for bit, the <tt>.wav</tt> you input. MP3, on the other hand, is a conspicuously lossy encoder, that is, it <em>destroys</em> information in order to achieve much higher compression ratios. One essential piece of a modern MP3 encoder is its psycho-acoustic model, an engine that analyzes sound and determines, with quantization enabled, what will be the perceived loss incurring from destroying this or that feature in the sound. Needless to say, the better the model is, the better sound quality can be because the encoder will be able to take the right decisions about what information to destroy&mdash;destroy sound information that you should hear the least.</p>
<p>But at very low bit-rates, it&#8217;s the quantization/decimation engine that dominates. To meet the very low bit-rates, the encoder will destroy plenty of information, and even if the psycho-acoustic model helps it take the right decisions while coding, the result will still be a sound with lots of artifacts, and it will not sound that good. Effects will be noticeable or even irritating. At high bit-rates, however, it will be the psycho-acoustic model that will dominate. At high bit-rate, there is no need to fiercely quantize and decimate, but the psycho-acoustic model can still force the encoder to remove features in the sound that are deemed inaudible (according to the model engine) <em>despite having enough bits available to code them</em>. In this regime, cranking the bit-rate will result only in marginal file size increase, because what extra bits you throw at the encoding are lost to the model that removes features from the sound.</p>
<p>We have seen this before, haven&#8217;t we?</p>
<div id="attachment_2168" class="wp-caption aligncenter" style="width: 160px"><a href="http://hbfs.files.wordpress.com/2010/02/relative-filesizes.png"><img src="http://hbfs.files.wordpress.com/2010/02/relative-filesizes.png?w=150&#038;h=94" alt="" title="relative-filesizes" width="150" height="94" class="size-thumbnail wp-image-2168" /></a><p class="wp-caption-text">Relative file sizes for --vbr-new</p></div>
<p>In the graph, we kind of see that some files do not grow as bit-rate increase, others grow only somewhat; at least, quite sub-linearly.</p>
<p align="center">*<br />*&emsp;*</p>
<p>With lossless coding, we keep <em>everything</em>, so it is crucial that we have a good prediction model, that is, an engine that can give a good prediction on the next sample <img src='http://s0.wp.com/latex.php?latex=x_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_t' title='x_t' class='latex' /> from the previous <img src='http://s0.wp.com/latex.php?latex=w&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='w' title='w' class='latex' /> samples, <img src='http://s0.wp.com/latex.php?latex=x_%7Bt-1%7D%2C+x_%7Bt-2%7D%2C+%5Cldots%2C+x_%7Bt-w%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_{t-1}, x_{t-2}, &#92;ldots, x_{t-w}' title='x_{t-1}, x_{t-2}, &#92;ldots, x_{t-w}' class='latex' />. If the error <img src='http://s0.wp.com/latex.php?latex=%5Chat%7Bx%7D_t-x_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;hat{x}_t-x_t' title='&#92;hat{x}_t-x_t' class='latex' /> is (<a href="http://en.wikipedia.org/wiki/Expected_value" target="_blank">expectedly</a>) small (well, to be exact, if it has a small <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29" target="_blank">entropy</a>), then we can devise an efficient code for it and get good compression.</p>
<p>Flac does not use an elaborate sound model. It uses <a href="http://en.wikipedia.org/wiki/Linear_prediction" target="_blank">linear prediction</a> to predict values. Linear prediction can be written as:<br /><Br></p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Chat%7Bx%7D_t+%3D+%5Csum_%7Bi%3D1%7D%5E%7Bw%7Da_i+x_%7Bt-i%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;hat{x}_t = &#92;sum_{i=1}^{w}a_i x_{t-i}' title='&#92;hat{x}_t = &#92;sum_{i=1}^{w}a_i x_{t-i}' class='latex' /></p>
<p>(or, in vector notation, <img src='http://s0.wp.com/latex.php?latex=%5Chat%7Bx%7D_t%3Da%5ETx_%7Bt-1%7D%5E%7Bt-w%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;hat{x}_t=a^Tx_{t-1}^{t-w}' title='&#92;hat{x}_t=a^Tx_{t-1}^{t-w}' class='latex' />&mdash;it <em>is</em> only a dot-product, after all.) At each time <img src='http://s0.wp.com/latex.php?latex=t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='t' title='t' class='latex' />, a new set of values <img src='http://s0.wp.com/latex.php?latex=a_i&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a_i' title='a_i' class='latex' /> are computed in order to minimize the expected error <img src='http://s0.wp.com/latex.php?latex=%28%5Chat%7Bx%7D_%7Bt%2B1%7D-x_%7Bt%2B1%7D%29%5E2&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='(&#92;hat{x}_{t+1}-x_{t+1})^2' title='(&#92;hat{x}_{t+1}-x_{t+1})^2' class='latex' /> (there are possible variants) for the next time step. Turns out it&#8217;s not too intensive if the window is small-ish (especially for somewhat recent CPUs), because if you need to invert a <img src='http://s0.wp.com/latex.php?latex=w%5Ctimes%7Bw%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='w&#92;times{w}' title='w&#92;times{w}' class='latex' /> matrix, it is of a special kind, it is a <a href="http://en.wikipedia.org/wiki/Toeplitz_matrix" target="_blank">Toeplitz Matrix</a>. Because of the structure of the matrix, it as <img src='http://s0.wp.com/latex.php?latex=O%28w%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(w)' title='O(w)' class='latex' /> degrees of freedom rather than <img src='http://s0.wp.com/latex.php?latex=O%28w%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(w^2)' title='O(w^2)' class='latex' />, and the inversion of a Toeplitz matrix (or solving a Toeplitz system; same difference) can be performed in <em>much</em> better than <img src='http://s0.wp.com/latex.php?latex=O%28w%5E3%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(w^3)' title='O(w^3)' class='latex' />: it can be done in <img src='http://s0.wp.com/latex.php?latex=O%28w%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(w^2)' title='O(w^2)' class='latex' />!</p>
<p>Linear Prediction Coding isn&#8217;t something new, it&#8217;s been in use in various <a href="http://en.wikipedia.org/wiki/Speech_coding" target="_blank">speech coding</a> schemes (such as the <a href="http://en.wikipedia.org/wiki/GSM#Voice_codecs" target="_blank">GSM Speech Codecs</a>). What makes it attractive is that, well, it seems to work fairly well on speech and sound, and it has efficient algorithms to solve for the parameters of the linear prediction.</p>
<p align="center">*<br />*&emsp;*</p>
<p>Once the prediction <img src='http://s0.wp.com/latex.php?latex=%5Chat%7Bx%7D_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;hat{x}_t' title='&#92;hat{x}_t' class='latex' /> is done, the difference <img src='http://s0.wp.com/latex.php?latex=x_t-%5Chat%7Bx%7D_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_t-&#92;hat{x}_t' title='x_t-&#92;hat{x}_t' class='latex' /> is coded. In Flac, it seems that this difference is considered to be <a href="http://en.wikipedia.org/wiki/Geometric_distribution" target="_blank">geometrically distributed</a>, and accordingly encoded using a <a href="http://en.wikipedia.org/wiki/Golomb_coding" target="_blank">Golomb Code</a>. It is the <em>exact</em> encoding the residual <img src='http://s0.wp.com/latex.php?latex=e_t%3Dx_t-%5Chat%7Bx%7D_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='e_t=x_t-&#92;hat{x}_t' title='e_t=x_t-&#92;hat{x}_t' class='latex' /> that makes the recovery of the original signal: indeed, <img src='http://s0.wp.com/latex.php?latex=x_t%3D%5Chat%7Bx%7D_t%2Be_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_t=&#92;hat{x}_t+e_t' title='x_t=&#92;hat{x}_t+e_t' class='latex' />.</p>
<p align="center">*<br />*&emsp;*</p>
<p>You can download the full scripts <a href="http://www.stevenpigeon.org/blogs/hbfs/cd-rip.sh" target="_blank">here for Flac</a> and <a href="http://www.stevenpigeon.org/blogs/hbfs/cd-rip-mp3.sh" target="_blank">here for MP3</a> high bit-rate VBR. Feel free to tinker with them; that&#8217;s what they&#8217;re for.</p>
<p align="center">*<br />*&emsp;*</p>
<p>The <a href="http://flac.sourceforge.net/" target="_blank">Flac</a> homepage is on sourceforge&#8230; Follow the link to know more about Flac.</p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/algorithms/'>algorithms</a>, <a href='http://hbfs.wordpress.com/category/bash-shell/'>Bash (Shell)</a>, <a href='http://hbfs.wordpress.com/category/data-compression/'>data compression</a>, <a href='http://hbfs.wordpress.com/category/embedded-programming/'>embedded programming</a>, <a href='http://hbfs.wordpress.com/category/mathematics/'>Mathematics</a>, <a href='http://hbfs.wordpress.com/category/programming/'>programming</a> Tagged: <a href='http://hbfs.wordpress.com/tag/cd/'>CD</a>, <a href='http://hbfs.wordpress.com/tag/entropy/'>Entropy</a>, <a href='http://hbfs.wordpress.com/tag/golomb-codes/'>Golomb Codes</a>, <a href='http://hbfs.wordpress.com/tag/gsm/'>GSM</a>, <a href='http://hbfs.wordpress.com/tag/linear-prediction/'>Linear Prediction</a>, <a href='http://hbfs.wordpress.com/tag/music/'>music</a>, <a href='http://hbfs.wordpress.com/tag/psychoacoustic-model/'>psychoacoustic model</a>, <a href='http://hbfs.wordpress.com/tag/psychoacoustics/'>psychoacoustics</a>, <a href='http://hbfs.wordpress.com/tag/sound/'>sound</a>, <a href='http://hbfs.wordpress.com/tag/speech/'>Speech</a>, <a href='http://hbfs.wordpress.com/tag/speech-codec/'>Speech Codec</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3872/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3872&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/17/lossless-coding-of-cd-audio/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2010/02/gramophone_19142.png?w=121" medium="image">
			<media:title type="html">Gramophone_1914</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2010/02/relative-filesizes.png?w=150" medium="image">
			<media:title type="html">relative-filesizes</media:title>
		</media:content>
	</item>
		<item>
		<title>Suggested Reading: National Audubon Society Guide To Photographing National Parks</title>
		<link>http://hbfs.wordpress.com/2012/01/15/suggested-reading-national-audubon-society-guite-to-photographing-national-parks/</link>
		<comments>http://hbfs.wordpress.com/2012/01/15/suggested-reading-national-audubon-society-guite-to-photographing-national-parks/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 04:37:34 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[Suggested Reading]]></category>
		<category><![CDATA[Audubon]]></category>
		<category><![CDATA[Landscape]]></category>
		<category><![CDATA[National Audubon]]></category>
		<category><![CDATA[National Parks]]></category>
		<category><![CDATA[Photography]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3911</guid>
		<description><![CDATA[Tim Fitzharris &#8212;&#160;National Audubon Society Guide To Photographing National Parks (Digital Edition)&#160;&#8212; Firefly Books 2009, 192 pp. ISBN&#160;978-1-55407-455-6 Contrary to what the title may indicate, this book has little to do with actual digital photography techniques for landscapes or natural wonders; it is rather a tourist guide of everything you should see in the major [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3911&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Tim Fitzharris &mdash;&nbsp;<a href="http://www.amazon.com/gp/product/155407455X/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=155407455X" target="_blank"><i>National Audubon Society Guide To Photographing National Parks (Digital Edition)</i></a>&nbsp;&mdash; Firefly Books 2009, 192 pp. ISBN&nbsp;978-1-55407-455-6</span></p>
<div id="attachment_3914" class="wp-caption aligncenter" style="width: 150px"><a href="http://www.amazon.com/gp/product/155407455X/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=155407455X"><img src="http://hbfs.files.wordpress.com/2012/01/audubon-national-parks.jpg?w=450" alt="" title="audubon-national-parks"   class="size-full wp-image-3914" /></a><p class="wp-caption-text">(Buy At Amazon.com)</p></div>
<p>Contrary to what the title may indicate, this book has little to do with actual digital photography techniques for landscapes or natural wonders; it is rather a tourist guide of everything you should see in the major american national parks. After expediting the basics of landscape photography, the guide leads you into a detailed journey into the United States major national parks, giving you all the good hints as to how to reach this-or-that natural wonder, what time of year, or even the time of day will lend itself the best for photography.</p>
<p>While this seems moderately interesting for a foreigner (especially if one doesn&#8217;t really want to visit all the state parks), the book is entirely redeemed by Fitzharris&#8217; <em>absolutely superb</em> photography. A must get, especially if you&#8217;re interested in landscape photography.</p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/suggested-reading/'>Suggested Reading</a> Tagged: <a href='http://hbfs.wordpress.com/tag/audubon/'>Audubon</a>, <a href='http://hbfs.wordpress.com/tag/landscape/'>Landscape</a>, <a href='http://hbfs.wordpress.com/tag/national-audubon/'>National Audubon</a>, <a href='http://hbfs.wordpress.com/tag/national-parks/'>National Parks</a>, <a href='http://hbfs.wordpress.com/tag/photography/'>Photography</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3911/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3911&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/15/suggested-reading-national-audubon-society-guite-to-photographing-national-parks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/audubon-national-parks.jpg" medium="image">
			<media:title type="html">audubon-national-parks</media:title>
		</media:content>
	</item>
		<item>
		<title>Suggested Reading: The Practical Pyromaniac</title>
		<link>http://hbfs.wordpress.com/2012/01/15/suggested-reading-the-practical-pyromaniac/</link>
		<comments>http://hbfs.wordpress.com/2012/01/15/suggested-reading-the-practical-pyromaniac/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 04:23:00 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[Suggested Reading]]></category>
		<category><![CDATA[flame-thrower]]></category>
		<category><![CDATA[mischief]]></category>
		<category><![CDATA[pyromaniac]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3903</guid>
		<description><![CDATA[William Gurstelle &#8212;&#160;The Practical Pyromaniac: Build Fire Tornadoes, One-Candlepower Engines, Great Balls of Fire, and More Incendiary Devices&#160;&#8212; Chicago Review Press, 2011, 212 pp. ISBN&#160;978-1-56976-710-8 This book is quite the step up from Mini Weapons Of Mass Destruction 2: not only the projects presented are a lot more interesting&#8212;flame-thrower and all&#8212;they are presented in their [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3903&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>William Gurstelle &mdash;&nbsp;<i><a href="http://www.amazon.com/gp/product/1569767106/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1569767106" target="_blank">The Practical Pyromaniac</a>: Build Fire Tornadoes, One-Candlepower Engines, Great Balls of Fire, and More Incendiary Devices</i>&nbsp;&mdash; Chicago Review Press, 2011, 212 pp. ISBN&nbsp;978-1-56976-710-8</span></p>
<div id="attachment_3905" class="wp-caption aligncenter" style="width: 150px"><a href="http://www.amazon.com/gp/product/1569767106/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1569767106"><img src="http://hbfs.files.wordpress.com/2012/01/practical-pyro.jpg?w=450" alt="" title="practical-pyro"   class="size-full wp-image-3905" /></a><p class="wp-caption-text">(Buy At Amazon)</p></div>
<p>This book is quite the step up from <a href="http://hbfs.wordpress.com/2011/12/30/suggested-reading-mini-weapons-of-mass-destruction-2/" target="_blank">Mini Weapons Of Mass Destruction 2</a>: not only the projects presented are a lot more interesting&mdash;flame-thrower and all&mdash;they are presented in their historical and scientific contexts, explaining the active principles of each device/contraption. While <i>Mini Weapons</i> was something of a kid&#8217;s book, <a href="http://www.amazon.com/gp/product/1569767106/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1569767106" target="_blank"><i>The Practical Pyromaniac</i></a> certainly <em>isn&#8217;t</em>. I must say it will appeal quite a lot to every one with a little <em>envie de destruction</em>.</p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/suggested-reading/'>Suggested Reading</a> Tagged: <a href='http://hbfs.wordpress.com/tag/flame-thrower/'>flame-thrower</a>, <a href='http://hbfs.wordpress.com/tag/mischief/'>mischief</a>, <a href='http://hbfs.wordpress.com/tag/pyromaniac/'>pyromaniac</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3903/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3903&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/15/suggested-reading-the-practical-pyromaniac/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/practical-pyro.jpg" medium="image">
			<media:title type="html">practical-pyro</media:title>
		</media:content>
	</item>
		<item>
		<title>Medians (Part II)</title>
		<link>http://hbfs.wordpress.com/2012/01/10/medians-part-ii/</link>
		<comments>http://hbfs.wordpress.com/2012/01/10/medians-part-ii/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 15:42:46 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[C-plus-plus]]></category>
		<category><![CDATA[data structures]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[ADL]]></category>
		<category><![CDATA[cmpxchg]]></category>
		<category><![CDATA[median]]></category>
		<category><![CDATA[Sorting Networks]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3815</guid>
		<description><![CDATA[In the previous post of this series, we left off where we were asking ourselves if there was a better way than the selection algorithm of finding the median. Computing the median of three numbers is a simple as sorting the three numbers (an operation that can be done in constant time, after all, if [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3815&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://hbfs.wordpress.com/2011/12/27/medians-part-i/" target="_blank">previous post of this series</a>, we left off where we were asking ourselves if there was a better way than the <tt>selection</tt> algorithm of finding the median.</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg"><img src="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg?w=450" alt="" title="split-rock-small"   class="aligncenter size-full wp-image-3812" /></a></p>
<p>Computing the median of three numbers is a simple as sorting the three numbers (an operation that can be done in constant time, after all, if comparing and swapping are constant time) and picking the middle. However, if the objects compared are &#8220;heavy&#8221;, comparing and (especially) moving them around may be expensive.</p>
<p><span id="more-3815"></span></p>
<p>One possibility, is to use comparison, but <em>no</em> swapping, to find the median. Basically, for three numbers <img src='http://s0.wp.com/latex.php?latex=a&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a' title='a' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b' title='b' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=c&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='c' title='c' class='latex' />, we want to know which one of the six following arrangements</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=a+%5Cleqslant+b+%5Cleqslant+c&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a &#92;leqslant b &#92;leqslant c' title='a &#92;leqslant b &#92;leqslant c' class='latex' />,</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=a+%5Cleqslant+c+%5Cleqslant+b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a &#92;leqslant c &#92;leqslant b' title='a &#92;leqslant c &#92;leqslant b' class='latex' />,</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=b+%5Cleqslant+a+%5Cleqslant+c&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b &#92;leqslant a &#92;leqslant c' title='b &#92;leqslant a &#92;leqslant c' class='latex' />,</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=b+%5Cleqslant+c+%5Cleqslant+a&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b &#92;leqslant c &#92;leqslant a' title='b &#92;leqslant c &#92;leqslant a' class='latex' />,</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=c+%5Cleqslant+a+%5Cleqslant+b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='c &#92;leqslant a &#92;leqslant b' title='c &#92;leqslant a &#92;leqslant b' class='latex' />,</p>
<p>or</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=c+%5Cleqslant+b+%5Cleqslant+a&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='c &#92;leqslant b &#92;leqslant a' title='c &#92;leqslant b &#92;leqslant a' class='latex' /></p>
<p>holds.</p>
<p>The goal is therefore to devise a testing procedure that determines, in an optimal number of steps, which one of the above six arrangements corresponds to the actual values of <img src='http://s0.wp.com/latex.php?latex=a&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a' title='a' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b' title='b' class='latex' />, and <img src='http://s0.wp.com/latex.php?latex=c&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='c' title='c' class='latex' />. We can proceed by elimination. Testing any two pairs, say <img src='http://s0.wp.com/latex.php?latex=a%5Cleqslant+b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a&#92;leqslant b' title='a&#92;leqslant b' class='latex' />, will split the above six arrangements in two groups; one in which <img src='http://s0.wp.com/latex.php?latex=a%5Cleqslant+b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a&#92;leqslant b' title='a&#92;leqslant b' class='latex' /> holds (with three arrangements) and one in which it does not (also with three arrangements). Each group can be further divided by asking another question, say, <img src='http://s0.wp.com/latex.php?latex=b%5Cleqslant+c&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b&#92;leqslant c' title='b&#92;leqslant c' class='latex' />, which also creates two sub-groups, and so on, until all groups have been broken down to exactly one arrangement. In C++, that would look pretty much like:</p>
<p><pre class="brush: cpp;">
template &lt;typename T&gt;
const T &amp; median3( const T &amp; a,
                   const T &amp; b,
                   const T &amp; c )
 {
   if (a&lt;b) // {a,b,c}, {a,c,b}, {c,a,b}
     if (b&lt;c)
      return b; // {a,b,c}
     else
      // {a,c,b}, {c,a,b}
      if (a&lt;c)
       return c; // {a,c,b}
      else
       return a; // {c,a,b}
    else
     // {b,a,c}, {b,c,a}, {c,b,a}
     if (b&lt;c)  // {b,a,c}, {b,c,a}
      if (a&lt;c)
       return a; // {b,a,c}
      else
       return c; // {b,c,a}
     else
      return b; // {c,b,a}
 }
</pre></p>
<p>You understood that the method works (it will ask essentially <img src='http://s0.wp.com/latex.php?latex=O%28%5Clg+n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(&#92;lg n)' title='O(&#92;lg n)' class='latex' /> questions for <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> numbers, and it does not move object around), but also that the resulting if-tree grows rapidly in the number of items to sort. That is, good luck if you have to find the median of 25 numbers.</p>
<p>Moreover, this method is branch-intensive: it will branch randomly, and will be difficult to predict by the <a href="http://en.wikipedia.org/wiki/Branch_predictor" target="_blank">branch predictor</a>, therefore (probably) harming performance quite a bit.</p>
<p>If we allow swaps (and consider them inexpensive), we can do something using <a href="http://en.wikipedia.org/wiki/Sorting_network" target="_blank">sorting networks</a>.</p>
<p>A <em>sorting network</em> can be understood as a train-yard sorting station metaphor. You have <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> lanes, and along the lanes, you have a certain number of exchanging junctions onto other lanes. That is, if a train on a lane <img src='http://s0.wp.com/latex.php?latex=i&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='i' title='i' class='latex' /> switches to lane <img src='http://s0.wp.com/latex.php?latex=j&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='j' title='j' class='latex' />, then the train on lane <img src='http://s0.wp.com/latex.php?latex=j&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='j' title='j' class='latex' /> switches to lane <img src='http://s0.wp.com/latex.php?latex=i&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='i' title='i' class='latex' />.</p>
<p>The goal is to figure out the most efficient network (with the smallest number of switches) for a given number of lanes. Let us see what sorting networks look like.</p>
<p>For <img src='http://s0.wp.com/latex.php?latex=n%3D1&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=1' title='n=1' class='latex' /> or <img src='http://s0.wp.com/latex.php?latex=n%3D2&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=2' title='n=2' class='latex' />, there isn&#8217;t much to do: with <img src='http://s0.wp.com/latex.php?latex=n%3D2&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=2' title='n=2' class='latex' />, you just compare the two values and swap them if one is larger than the other (that is, <img src='http://s0.wp.com/latex.php?latex=a%3Eb&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a&gt;b' title='a&gt;b' class='latex' />, if you want increasing order). In C++, the basic operation is therefore given by:</p>
<p><pre class="brush: cpp;">
template &lt;typename T&gt;
void cmpexchg(T &amp; a, T &amp; b)
 {
  using std::swap; // ADL trick
  if (a&gt;b) swap(a,b);
 }
</pre></p>
<p>and the graph representing the network is denoted as:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/network-2.png"><img src="http://hbfs.files.wordpress.com/2011/12/network-2.png?w=450" alt="" title="network-2"   class="aligncenter size-full wp-image-3819" /></a></p>
<p>Where lanes are joined by lines, and circle/dots indicate that the lanes are connected through the line (this is somewhat of an electrical engineering notation, but the large dots, although redundant, make it easier to understand the graph). At <img src='http://s0.wp.com/latex.php?latex=n%3D3&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=3' title='n=3' class='latex' />, we have a network such as:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/network-3.png"><img src="http://hbfs.files.wordpress.com/2011/12/network-3.png?w=450" alt="" title="network-3"   class="aligncenter size-full wp-image-3820" /></a></p>
<p>(where the color lines locate &#8216;sequence points&#8217; in the network which determine which compares can be done in parallel) which would yield the code:</p>
<p><pre class="brush: cpp;">
template &lt;typename T&gt;
T median3_network( T a,
                   T b,
                   T c )
 {
  cmpexchg(a,c);
  cmpexchg(a,b);
  cmpexchg(b,c);
  return b;
 }
</pre></p>
<p>One of the possible networks for <img src='http://s0.wp.com/latex.php?latex=n%3D4&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=4' title='n=4' class='latex' /> is given by:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/network-4.png"><img src="http://hbfs.files.wordpress.com/2011/12/network-4.png?w=450" alt="" title="network-4"   class="aligncenter size-full wp-image-3821" /></a></p>
<p>and finally, for <img src='http://s0.wp.com/latex.php?latex=n%3D5&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=5' title='n=5' class='latex' />, one of the possible solutions looks like:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/network-5.png"><img src="http://hbfs.files.wordpress.com/2011/12/network-5.png?w=450" alt="" title="network-5"   class="aligncenter size-full wp-image-3822" /></a></p>
<p>&#8230;which is already rather more complex than with <img src='http://s0.wp.com/latex.php?latex=n%3D4&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=4' title='n=4' class='latex' />.</p>
<p>The advantage with a sorting network is that, in theory, a great number of comparison/exchange can be made in parallel (as they are mutually independent) and thus collapse the depth of the network significantly. Assuming that we can do up to <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> comparison in parallel, the depth of such a network is <img src='http://s0.wp.com/latex.php?latex=O%28%28%5Clg+n%29%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O((&#92;lg n)^2)' title='O((&#92;lg n)^2)' class='latex' /> (but with size <img src='http://s0.wp.com/latex.php?latex=O%28n+%28%5Clg+n%29%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n (&#92;lg n)^2)' title='O(n (&#92;lg n)^2)' class='latex' />).</p>
<p>The disadvantage is that, in general, it is <em>hard</em> (in the <a href="http://en.wikipedia.org/wiki/Co-NP" target="_blank">co-NP</a>-complete sense) to determine whether or not a given network is optimal for the size of the problem (or, conversely, it is <a href="http://en.wikipedia.org/wiki/NP-complete" target="_blank">NP-Complete</a> to arrive at an optimal network)!</p>
<p align="center">*<br />*&emsp;*</p>
<p>So sorting networks are great if you can have parallel compare/exchange operations, but when we examine the primitive <tt>cmpexchg(T &amp; a, T &amp; b)</tt>, we see they&#8217;re not fundamentally more efficient than the if-tree based solution. If one thing, they&#8217;re much worst if the cost of swapping is high.</p>
<p align="center">*<br />*&emsp;*</p>
<p>Can we use a primitive such as <tt>median3</tt> to get the median of a much larger number of values than just three? What is a <tt>median3</tt> of three <tt>median3</tt>?</p>
<p><em>To Be Continued&#8230;</em></p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/algorithms/'>algorithms</a>, <a href='http://hbfs.wordpress.com/category/c/'>C</a>, <a href='http://hbfs.wordpress.com/category/c-plus-plus/'>C-plus-plus</a>, <a href='http://hbfs.wordpress.com/category/data-structures/'>data structures</a>, <a href='http://hbfs.wordpress.com/category/programming/'>programming</a> Tagged: <a href='http://hbfs.wordpress.com/tag/adl/'>ADL</a>, <a href='http://hbfs.wordpress.com/tag/cmpxchg/'>cmpxchg</a>, <a href='http://hbfs.wordpress.com/tag/median/'>median</a>, <a href='http://hbfs.wordpress.com/tag/sorting-networks/'>Sorting Networks</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3815/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3815&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/10/medians-part-ii/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg" medium="image">
			<media:title type="html">split-rock-small</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/network-2.png" medium="image">
			<media:title type="html">network-2</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/network-3.png" medium="image">
			<media:title type="html">network-3</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/network-4.png" medium="image">
			<media:title type="html">network-4</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/network-5.png" medium="image">
			<media:title type="html">network-5</media:title>
		</media:content>
	</item>
		<item>
		<title>Building a Balanced Tree From a List in Linear Time</title>
		<link>http://hbfs.wordpress.com/2012/01/03/building-a-balanced-tree-from-a-list-in-linear-time/</link>
		<comments>http://hbfs.wordpress.com/2012/01/03/building-a-balanced-tree-from-a-list-in-linear-time/#comments</comments>
		<pubDate>Tue, 03 Jan 2012 15:41:00 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[data compression]]></category>
		<category><![CDATA[data structures]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[phase-in codes]]></category>
		<category><![CDATA[Huffman]]></category>
		<category><![CDATA[Huffman Codes]]></category>
		<category><![CDATA[Tree]]></category>
		<category><![CDATA[Search Tree]]></category>
		<category><![CDATA[Segregated Storage]]></category>
		<category><![CDATA[Compact Tree]]></category>
		<category><![CDATA[Compact Tree Storage]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3846</guid>
		<description><![CDATA[The usual way of forming a search tree from a list is to scan the list and insert each of its element, one by one, into the tree, leading to a(n expected) run-time of . However, if the list is sorted (in ascending order, say) and the tree is not one of the self-balancing varieties, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3846&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The usual way of forming a <a href="http://en.wikipedia.org/wiki/Search_tree" target="_blank">search tree</a> from a <a href="http://en.wikipedia.org/wiki/List_%28abstract_data_type%29" target="_blank">list</a> is to scan the list and insert each of its element, one by one, into the tree, leading to a(n expected) run-time of <img src='http://s0.wp.com/latex.php?latex=O%28n+%5Clg+n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n &#92;lg n)' title='O(n &#92;lg n)' class='latex' />.</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/img_6725-small.jpg"><img src="http://hbfs.files.wordpress.com/2011/12/img_6725-small.jpg?w=200&#038;h=132" alt="" title="IMG_6725-small" width="200" height="132" class="aligncenter size-thumbnail wp-image-3851" /></a></p>
<p>However, if the list is sorted (in ascending order, say) and the tree is not one of the <a href="http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree" target="_blank">self-balancing varieties</a>, insertion is <img src='http://s0.wp.com/latex.php?latex=O%28n%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n^2)' title='O(n^2)' class='latex' />, because the &#8220;tree&#8221; created by the successive insertions of sorted key is in fact a degenerate tree, a list. So, what if the list is already sorted and don&#8217;t really want to have a self-balancing tree? Well, it turns out that you can build a(n almost perfectly) balanced tree in <img src='http://s0.wp.com/latex.php?latex=O%28n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n)' title='O(n)' class='latex' />.</p>
<p><span id="more-3846"></span></p>
<p>Let us make the simplifying assumption that we can have two types of nodes in the tree: leaves, that contains the actual data, and internal nodes (or just nodes for the remainder of this post) that holds only a key.</p>
<p>The first strategy that comes to mind, using this assumption, is to use a method reminiscent of how <a href="http://hbfs.wordpress.com/2011/05/17/huffman-codes/" target="_blank">Huffman Codes</a> are constructed. That is, we scan the original list from left to right: we take two nodes (if there are at least two left), remove them from the list, and insert back in their place a new node containing the key of the second list item (remember, the list is supposed to be sorted in ascending order), with the two original nodes (or leaves) as children. That is, the operation goes something like this:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram-merge.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram-merge.png?w=150&#038;h=53" alt="" title="tree-diagram-merge" width="150" height="53" class="aligncenter size-thumbnail wp-image-3852" /></a></p>
<p>(Notice the metaphor: leaves are green, internal nodes brown.) We simply re-scan the list until we have a single remaining node, which will be the root of the tree. The procedure is simple enough, indeed. Let us work out a full example with 5 leaves:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png?w=150&#038;h=21" alt="" title="tree-diagram1" width="150" height="21" class="aligncenter size-thumbnail wp-image-3853" /></a></p>
<p>Then we do one pass of merges:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram2.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram2.png?w=150&#038;h=53" alt="" title="tree-diagram2" width="150" height="53" class="aligncenter size-thumbnail wp-image-3854" /></a></p>
<p>Then another:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram3.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram3.png?w=150&#038;h=85" alt="" title="tree-diagram3" width="150" height="85" class="aligncenter size-thumbnail wp-image-3855" /></a></p>
<p>&#8230;and finally:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram4.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram4.png?w=150&#038;h=109" alt="" title="tree-diagram4" width="150" height="109" class="aligncenter size-thumbnail wp-image-3856" /></a></p>
<p>What do you notice? The three isn&#8217;t all that balanced. In fact, it&#8217;s <em>really</em> unbalanced. What went wrong? Well, if for Huffman Codes, this algorithm is optimal, for search trees, it is clearly not. For one thing, Huffman&#8217;s method wants to build trees with average path-lengths as close as possible to the source&#8217;s entropy; here our goal is to have path-lengths as equal as possible.</p>
<p>Fortunately, we can &#8220;repair&#8221; the algorithm quite easily, and it&#8217;s another code that will come to our aid: <a href="http://hbfs.wordpress.com/2008/09/02/phase-in-codes/" target="_blank">Phase-In Codes</a>, which have average code lengths close to <img src='http://s0.wp.com/latex.php?latex=%5Clg+n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lg n' title='&#92;lg n' class='latex' /> (as I show <a href="http://hbfs.wordpress.com/2008/09/23/length-of-phase-in-codes/" target="_blank">here</a>).</p>
<p>The key observation is that if the length of the list <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> is a power of two, then the list can be successively merged together by the above algorithm and yield a tree that has <em>exactly</em> <img src='http://s0.wp.com/latex.php?latex=%5Clg+n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lg n' title='&#92;lg n' class='latex' /> depth. Therefore, if we somehow modify the list to have exactly <img src='http://s0.wp.com/latex.php?latex=2%5Ek&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2^k' title='2^k' class='latex' /> items in it, we&#8217;ll be able to have our optimal <img src='http://s0.wp.com/latex.php?latex=O%28%5Clg+n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(&#92;lg n)' title='O(&#92;lg n)' class='latex' /> depth. Phase-In Codes provides the mean to do so. If</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=n+%3D+2%5Ek%2Bb&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n = 2^k+b' title='n = 2^k+b' class='latex' /></p>
<p>with the largest <img src='http://s0.wp.com/latex.php?latex=k%5Cin%5Cmathbb%7BN%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='k&#92;in&#92;mathbb{N}' title='k&#92;in&#92;mathbb{N}' class='latex' /> and with <img src='http://s0.wp.com/latex.php?latex=b%5Cin%5Cmathbb%7BZ%5E%2A%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b&#92;in&#92;mathbb{Z^*}' title='b&#92;in&#92;mathbb{Z^*}' class='latex' /> (thus imposing the uniqueness of the solution, with <img src='http://s0.wp.com/latex.php?latex=0%5Cleqslant%7Bb%7D%3C%7B2%5Ek%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='0&#92;leqslant{b}&lt;{2^k}' title='0&#92;leqslant{b}&lt;{2^k}' class='latex' />), we can merge the first <img src='http://s0.wp.com/latex.php?latex=2b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2b' title='2b' class='latex' /> nodes, and it will result in a list of <img src='http://s0.wp.com/latex.php?latex=2%5Ek&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2^k' title='2^k' class='latex' /> nodes (<img src='http://s0.wp.com/latex.php?latex=b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b' title='b' class='latex' /> of which are now internal nodes). Once we have merged the first <img src='http://s0.wp.com/latex.php?latex=2b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2b' title='2b' class='latex' /> nodes, we stop the iteration there and restart from the beginning of the list, which is now of length <img src='http://s0.wp.com/latex.php?latex=2%5Ek&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2^k' title='2^k' class='latex' />, and will yield the most-equal depth tree we wanted.</p>
<p>Again, with <img src='http://s0.wp.com/latex.php?latex=n%3D5&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=5' title='n=5' class='latex' />: <img src='http://s0.wp.com/latex.php?latex=n%3D2%5Ek%2Bb%3D2%5E2%2B1&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=2^k+b=2^2+1' title='n=2^k+b=2^2+1' class='latex' />, so <img src='http://s0.wp.com/latex.php?latex=b%3D1&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b=1' title='b=1' class='latex' />, we take the two first nodes</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png?w=150&#038;h=21" alt="" title="tree-diagram1" width="150" height="21" class="aligncenter size-thumbnail wp-image-3853" /></a></p>
<p>and merge them:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram5.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram5.png?w=150&#038;h=53" alt="" title="tree-diagram5" width="150" height="53" class="aligncenter size-thumbnail wp-image-3860" /></a></p>
<p>and <em>voilà</em>, we proceed with the successive merges:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram6.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram6.png?w=150&#038;h=85" alt="" title="tree-diagram6" width="150" height="85" class="aligncenter size-thumbnail wp-image-3861" /></a></p>
<p>and then</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram7.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram7.png?w=150&#038;h=109" alt="" title="tree-diagram7" width="150" height="109" class="aligncenter size-thumbnail wp-image-3862" /></a></p>
<p>and we have a tree of almost equal depth, with average depth of <img src='http://s0.wp.com/latex.php?latex=%5E%7B12%7D%2F_%7B5%7D%3D2.4&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='^{12}/_{5}=2.4' title='^{12}/_{5}=2.4' class='latex' /> (while the actual <img src='http://s0.wp.com/latex.php?latex=%5Clg+5+%5Capprox+2.32&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lg 5 &#92;approx 2.32' title='&#92;lg 5 &#92;approx 2.32' class='latex' />) rather than <img src='http://s0.wp.com/latex.php?latex=%5E%7B13%7D%2F_%7B5%7D+%5Capprox+2.6&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='^{13}/_{5} &#92;approx 2.6' title='^{13}/_{5} &#92;approx 2.6' class='latex' />, a considerable improvement, even for this trivial tree.</p>
<p align="center">*<br />*&emsp;*</p>
<p>While it is not the customary way of representing a tree in memory, the segregation of leaves and internal nodes may be justified in the context where the leaves and the tree itself live in different memory locations. For example, if the leaves are rather large compared to just the key, they may have to reside in external memory&mdash;that is, the disk&mdash;while you will still like to keep the index, the tree structure itself, in memory.</p>
<p>One could also argue that this scheme is cache-friendlier as one can use some kind of <a href="http://hbfs.wordpress.com/2009/04/07/compact-tree-storage/" target="_blank">compact tree storage</a> to maintain the keys and the structure of the tree in a small amount of memory, and with keys that are nearby in the tree being laid-out nearby in memory, thus reducing <a href="http://en.wikipedia.org/wiki/CPU_cache#Cache_miss" target="_blank">cache misses</a> quite a lot.</p>
<p>In all cases, we still have reduced the complexity of building the initial tree from <img src='http://s0.wp.com/latex.php?latex=O%28n+%5Clg+n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n &#92;lg n)' title='O(n &#92;lg n)' class='latex' /> to essentially <img src='http://s0.wp.com/latex.php?latex=O%28n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n)' title='O(n)' class='latex' />,</p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/algorithms/'>algorithms</a>, <a href='http://hbfs.wordpress.com/category/data-compression/'>data compression</a>, <a href='http://hbfs.wordpress.com/category/data-structures/'>data structures</a>, <a href='http://hbfs.wordpress.com/category/programming/'>programming</a> Tagged: <a href='http://hbfs.wordpress.com/tag/compact-tree/'>Compact Tree</a>, <a href='http://hbfs.wordpress.com/tag/compact-tree-storage/'>Compact Tree Storage</a>, <a href='http://hbfs.wordpress.com/tag/huffman/'>Huffman</a>, <a href='http://hbfs.wordpress.com/tag/huffman-codes/'>Huffman Codes</a>, <a href='http://hbfs.wordpress.com/tag/phase-in-codes/'>phase-in codes</a>, <a href='http://hbfs.wordpress.com/tag/search-tree/'>Search Tree</a>, <a href='http://hbfs.wordpress.com/tag/segregated-storage/'>Segregated Storage</a>, <a href='http://hbfs.wordpress.com/tag/tree/'>Tree</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3846/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3846&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/03/building-a-balanced-tree-from-a-list-in-linear-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/img_6725-small.jpg?w=150" medium="image">
			<media:title type="html">IMG_6725-small</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram-merge.png?w=150" medium="image">
			<media:title type="html">tree-diagram-merge</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png?w=150" medium="image">
			<media:title type="html">tree-diagram1</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram2.png?w=150" medium="image">
			<media:title type="html">tree-diagram2</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram3.png?w=150" medium="image">
			<media:title type="html">tree-diagram3</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram4.png?w=150" medium="image">
			<media:title type="html">tree-diagram4</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png?w=150" medium="image">
			<media:title type="html">tree-diagram1</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram5.png?w=150" medium="image">
			<media:title type="html">tree-diagram5</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram6.png?w=150" medium="image">
			<media:title type="html">tree-diagram6</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram7.png?w=150" medium="image">
			<media:title type="html">tree-diagram7</media:title>
		</media:content>
	</item>
		<item>
		<title>Suggested Reading: Mini Weapons of Mass Destruction 2</title>
		<link>http://hbfs.wordpress.com/2011/12/30/suggested-reading-mini-weapons-of-mass-destruction-2/</link>
		<comments>http://hbfs.wordpress.com/2011/12/30/suggested-reading-mini-weapons-of-mass-destruction-2/#comments</comments>
		<pubDate>Sat, 31 Dec 2011 01:36:38 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[Life in the workplace]]></category>
		<category><![CDATA[Suggested Reading]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3838</guid>
		<description><![CDATA[John Austin &#8212;&#160;Mini Weapons of Mass Destruction 2 &#8212; Build A Secret Agent Arsenal&#160;&#8212; Chicago Review Press 2011, 260 pp. ISBN&#160;978-1-56976-716-0 A quite amusing little books on needlessly complicated hacks, but that can bring quite the ruckus in the office/school. Q-Tip launchers, (paper) ninja stars, rubberband weapons, CD periscopes, &#8230; all built from readily available [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3838&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>John Austin &mdash;&nbsp;<a href="http://www.amazon.com/gp/product/1569767165?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=xm2&amp;camp=1789&amp;creativeASIN=1569767165" target="_blank"><i>Mini Weapons of Mass Destruction 2 &mdash;<br />
Build A Secret Agent Arsenal</i></a>&nbsp;&mdash; Chicago Review Press<br />
2011, 260 pp. ISBN&nbsp;978-1-56976-716-0</span></p>
<div id="attachment_3839" class="wp-caption aligncenter" style="width: 150px"><a href="http://www.amazon.com/gp/product/1569767165?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=xm2&amp;camp=1789&amp;creativeASIN=1569767165"><img src="http://hbfs.files.wordpress.com/2011/12/mini-weapons-2.jpg?w=450" alt="" title="mini-weapons-2"   class="size-full wp-image-3839" /></a><p class="wp-caption-text">(Buy at Amazon.com)</p></div>
<p>A quite amusing little books on needlessly complicated hacks, but that can bring quite the ruckus in the office/school. Q-Tip launchers, (paper) ninja stars, rubberband weapons, CD periscopes, &#8230; all built from readily available office supplies. In fact, they are all <em>way</em> too complicated, but sooo much fun.</p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/life-in-the-workplace/'>Life in the workplace</a>, <a href='http://hbfs.wordpress.com/category/suggested-reading/'>Suggested Reading</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3838/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3838&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2011/12/30/suggested-reading-mini-weapons-of-mass-destruction-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/mini-weapons-2.jpg" medium="image">
			<media:title type="html">mini-weapons-2</media:title>
		</media:content>
	</item>
		<item>
		<title>Medians (Part I)</title>
		<link>http://hbfs.wordpress.com/2011/12/27/medians-part-i/</link>
		<comments>http://hbfs.wordpress.com/2011/12/27/medians-part-i/#comments</comments>
		<pubDate>Tue, 27 Dec 2011 22:51:40 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[C-plus-plus]]></category>
		<category><![CDATA[data structures]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[theoretical computer science]]></category>
		<category><![CDATA[QuickSort]]></category>
		<category><![CDATA[median]]></category>
		<category><![CDATA[selection]]></category>
		<category><![CDATA[Wirth]]></category>
		<category><![CDATA[Lomuto]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3806</guid>
		<description><![CDATA[In a previous installment, about filtering noise, we discussed how to use a moving average to even out irregularities in a sequence. Averaging over a window is tricky. First, the window size must make sense in terms of the speed at which the signal changes (in samples), and the average is (overly) sensitive to outliers. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3806&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In a previous installment, about <a href="http://hbfs.wordpress.com/2009/08/25/filtering-noise-part-i/" target="_blank">filtering noise</a>, we discussed how to use a moving average to even out irregularities in a sequence. Averaging over a window is tricky. First, the window size must make sense in terms of the speed at which the signal changes (in samples), and the average is (overly) sensitive to <a href="http://en.wikipedia.org/wiki/Outlier" target="_blank">outliers</a>.</p>
<p><a href="http://commons.wikimedia.org/wiki/File:Split_Rock_-_geograph.org.uk_-_25518.jpg"><img src="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg?w=450" alt="" title="split-rock-small"   class="aligncenter size-full wp-image-3812" /></a></p>
<p>One way to limit the influence of the outliers for denoising is to use the <a href="http://en.wikipedia.org/wiki/Median" target="_blank">median</a>. However, computing the median is usually more tricky than computing the average, or mean, and this first post (in a series of three, in the next few weeks), discusses how to compute the median efficiently using the <em>selection</em> algorithm.</p>
<p><span id="more-3806"></span></p>
<p align="center">*<br />*&emsp;*</p>
<p>Before continuing, let us say a word on why the average is sensitive to outliers. There&#8217;s of course the intuitive explanation that if you have three values, say, 1,2, and 10, then the average is <img src='http://s0.wp.com/latex.php?latex=%5Capprox+4.3&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;approx 4.3' title='&#92;approx 4.3' class='latex' />, which is rather far away of the majority of the samples that are grouped around 1 and 2. A less intuitive, but also rather straightforward reason is that the average sample average <img src='http://s0.wp.com/latex.php?latex=%5Cbar%7Bx%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;bar{x}' title='&#92;bar{x}' class='latex' /> is the solution to the least square equation</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Cbar%7Bx%7D%3D%5Carg+%5Cmin_%7Bx%7D+%5Csum_%7Bi%7D+%28x_i-x%29%5E2&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;bar{x}=&#92;arg &#92;min_{x} &#92;sum_{i} (x_i-x)^2' title='&#92;bar{x}=&#92;arg &#92;min_{x} &#92;sum_{i} (x_i-x)^2' class='latex' /></p>
<p>that is, it is the value that minimizes the square error between itself and all the sample values. If there&#8217;s an exceedingly large <img src='http://s0.wp.com/latex.php?latex=x_i&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_i' title='x_i' class='latex' />, it will pull the average towards itself quite a lot.</p>
<p>The median, say <img src='http://s0.wp.com/latex.php?latex=x_%7Bm%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_{m}' title='x_{m}' class='latex' />, is, on the other hand, the solution to the equation</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=x_m%3D%5Carg+%5Cmin_%7Bx%7D+%5Csum_%7Bi%7D+%7Cx_i-x%7C&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_m=&#92;arg &#92;min_{x} &#92;sum_{i} |x_i-x|' title='x_m=&#92;arg &#92;min_{x} &#92;sum_{i} |x_i-x|' class='latex' /></p>
<p>that is, the value that minimizes absolute value. An overly large <img src='http://s0.wp.com/latex.php?latex=x_j&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_j' title='x_j' class='latex' /> will not pull on the solution at all if we use the <a href="http://en.wikipedia.org/wiki/Median#The_sample_median" target="_blank">sample median</a>, which is defined as the middle <img src='http://s0.wp.com/latex.php?latex=x_j&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_j' title='x_j' class='latex' />, if all <img src='http://s0.wp.com/latex.php?latex=x_i&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_i' title='x_i' class='latex' /> are sorted. Basically, median (mostly) ignore outliers, which is something we might want for filtering.</p>
<p></p>
<p align="center">*<br />*&emsp;*</p>
<p>Since we&#8217;re interested in the sample median, we must find a way of finding the the <img src='http://s0.wp.com/latex.php?latex=x_j&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_j' title='x_j' class='latex' /> that occupies middle rank. Usually, the median is defined for <img src='http://s0.wp.com/latex.php?latex=%5Clceil+n%2F2+%5Crceil&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lceil n/2 &#92;rceil' title='&#92;lceil n/2 &#92;rceil' class='latex' /> if <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> is odd, and <img src='http://s0.wp.com/latex.php?latex=n%2F2&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n/2' title='n/2' class='latex' /> if <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> is even (or sometime the average of the two sample lying at the center is used, but that&#8217;s a minor detail for what we will discuss).</p>
<p>A first idea would be to sort the list, in <img src='http://s0.wp.com/latex.php?latex=O%28n+%5Clg+n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n &#92;lg n)' title='O(n &#92;lg n)' class='latex' /> using a robust sorting algorithm such as <a href="http://en.wikipedia.org/wiki/Quicksort" target="_blank">Quicksort</a>, or even in linear time if one can use <a href="http://hbfs.wordpress.com/2009/06/16/sorting-linked-lists-part-i/" target="_blank">Radix Sort</a>. But Radix Sort is a &#8220;big&#8221; linear time algorithm because it depends on the size of the machine-specific integers you&#8217;re using as a key. Would it even be that sorting completely the list is overkill, because we&#8217;re not interested in sorting them all, just in getting the middle value, when (sufficiently) sorted?</p>
<p>So how do we sort an array &#8220;just enough&#8221; to assess the median value?</p>
<p>Well, it turns out that Quicksort will lend us quite a hand here. Remember, Quicksort sorts quickly by dividing the array in two sub-arrays, with one on the left having all values smaller than the pivot (a value picked within the array) and with the other on the right with values larger than or equal to the pivot. If the pivot is chosen wisely, both sub-arrays should be roughly equal in length. Recursion on each sub-array will do again the same thing, all the way down where arrays are so small as to contain only one element. At that point, when recursion visited all possible sub-arrays, the array is sorted and the algorithm terminates.</p>
<p>So what if we only sorted one side, the side that might contain the median? The idea is to use the Quicksort partition algorithm (and there are more than one, but that&#8217;s more or less irrelevant right now) and look at the two resulting sub-array. Is the median position, <img src='http://s0.wp.com/latex.php?latex=k%3D%5Clceil+n%2F2%5Crceil&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='k=&#92;lceil n/2&#92;rceil' title='k=&#92;lceil n/2&#92;rceil' class='latex' /> somewhere in the left or the right sub-array? If it&#8217;s in the left, we reapply <em>only</em> on the left sub-array; if it&#8217;s in the right, we reapply <em>only</em> on the right sub-array. Again, we will obtain two sub-sub-arrays, and we check whether <img src='http://s0.wp.com/latex.php?latex=k&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='k' title='k' class='latex' /> is in the left sub-sub-array or the right sub-sub-array. We continue splitting until we create a sub-sub-&#8230;-sub-array of size one, in position <img src='http://s0.wp.com/latex.php?latex=k&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='k' title='k' class='latex' />: it will contain the median value!</p>
<p>A minimal C++ implementation would probably look something like this:</p>
<p><pre class="brush: cpp;">
////////////////////////////////////////
//
// Implements the generic selection
// algorithm loosely based on the
// QuickSort partition algorithm
// (more or less any of the QuickSort
// partition algorithms will do just
// fine)
//
// (it may be due to Wirth, from his
// book algorithms + data structures
// = programs)
//
// NOTE: array v gets modified in the
// process!
//
template &lt;typename T&gt;
const T &amp; select( std::vector&lt;T&gt; &amp; v,
                  int select_index)
  {
   // check if lo_index &lt;= select_index &lt;= hi_index
   // ...eventually
   //
   int lo_index=0;
   int hi_index=v.size()-1;

   while (hi_index&gt;lo_index)
    {
     // classic mid-point
     // pivot selection (greatly
     // reduces chances of n^2
     // behavior if v is
     // nearly sorted, as it would
     // be after a few call to select.
     //
     const T pivot = v[(lo_index+hi_index)/2];

     int x=lo_index;
     int y=hi_index;

     // basic pivot/partition
     // algorithm
     //
     while (x&lt;=y)
      {
       while (v[x]&lt;pivot) x++;
       while (v[y]&gt;pivot) y--;
       if (x&lt;=y)
        {
         std::swap(v[x],v[y]);
         x++;
         y--;
        }
      }

     // check on which side of
     // the partition lands the
     // select_index, so that we
     // iterate only on the half
     // that contains the select_index
     //
     if (select_index &gt;= x)
      lo_index=x;
     else
      hi_index=x-1;
    }

   // in our application, the actual
   // color isn't very important, unless
   // where at the last stage
   return v[select_index];
  }
</pre></p>
<p>You would invoke the above using, say</p>
<p><pre class="brush: cpp;">
std::vector&lt;T&gt; v;

//...fill v....

T med = select(v,v.size()/2);
</pre></p>
<p>for some type <tt>T</tt>.</p>
<p align="center">*<br />*&emsp;*</p>
<p>The above method has an expected run-time <img src='http://s0.wp.com/latex.php?latex=O%28n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n)' title='O(n)' class='latex' />, but, just like Quicksort, it can run as high as <img src='http://s0.wp.com/latex.php?latex=O%28n%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n^2)' title='O(n^2)' class='latex' />, which is quite bad. The choice of the pivot in the middle of the current sub-array reduces this risk quite a lot (and it would take <a href="http://hbfs.wordpress.com/2010/03/16/foiled/" target="_blank">quite an evil array</a> to push it back to <img src='http://s0.wp.com/latex.php?latex=O%28n%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n^2)' title='O(n^2)' class='latex' /> behavior). Furthermore, each time <tt>select</tt> is called on the array,it gets more and more sorted, making the choice of the pivot better and better with time.</p>
<p>Furthermore, it does swap elements multiple times (but only a number of times expected linear in <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' />), and, depending on the size of the array and the partition method (as I said, there are many variants, for example above is probably Hoare&#8217;s, but there is also Lomuto&#8217;s method, discussed in Cormen <em>et al</em>&#8216;s <i>Introduction to Algorithms</i>), might not be cache-friendly at all.</p>
<p>In the next post, we will examine other sort-like algorithms to obtain the median, maybe at lower cost.</p>
<p><em>To Be Continued&#8230;</em></p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/algorithms/'>algorithms</a>, <a href='http://hbfs.wordpress.com/category/c/'>C</a>, <a href='http://hbfs.wordpress.com/category/c-plus-plus/'>C-plus-plus</a>, <a href='http://hbfs.wordpress.com/category/data-structures/'>data structures</a>, <a href='http://hbfs.wordpress.com/category/programming/'>programming</a>, <a href='http://hbfs.wordpress.com/category/theoretical-computer-science/'>theoretical computer science</a> Tagged: <a href='http://hbfs.wordpress.com/tag/lomuto/'>Lomuto</a>, <a href='http://hbfs.wordpress.com/tag/median/'>median</a>, <a href='http://hbfs.wordpress.com/tag/quicksort/'>QuickSort</a>, <a href='http://hbfs.wordpress.com/tag/selection/'>selection</a>, <a href='http://hbfs.wordpress.com/tag/wirth/'>Wirth</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3806/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3806&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2011/12/27/medians-part-i/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg" medium="image">
			<media:title type="html">split-rock-small</media:title>
		</media:content>
	</item>
	</channel>
</rss>
