<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Harder, Better, Faster, Stronger</title>
	<atom:link href="http://hbfs.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://hbfs.wordpress.com</link>
	<description>Explorations in better, faster, stronger code.</description>
	<lastBuildDate>Tue, 31 Jan 2012 16:15:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='hbfs.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Harder, Better, Faster, Stronger</title>
		<link>http://hbfs.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://hbfs.wordpress.com/osd.xml" title="Harder, Better, Faster, Stronger" />
	<atom:link rel='hub' href='http://hbfs.wordpress.com/?pushpress=hub'/>
		<item>
		<title>(Random Musings) On Floats and Encodings</title>
		<link>http://hbfs.wordpress.com/2012/01/31/random-musings-on-floats-and-encodings/</link>
		<comments>http://hbfs.wordpress.com/2012/01/31/random-musings-on-floats-and-encodings/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 16:15:27 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[C-plus-plus]]></category>
		<category><![CDATA[data compression]]></category>
		<category><![CDATA[data structures]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[double]]></category>
		<category><![CDATA[extended]]></category>
		<category><![CDATA[float]]></category>
		<category><![CDATA[IEEE 754]]></category>
		<category><![CDATA[octuple]]></category>
		<category><![CDATA[quadruple]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3943</guid>
		<description><![CDATA[The float and double floating-point data types have been present for a long time in the C (and C++) standard. While neither the C nor C++ standards do not enforce it, virtually all implementations comply to the IEEE 754&#8212;or try very hard to. In fact, I do not know as of today of an implementation [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3943&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The <tt>float</tt> and <tt>double</tt> floating-point data types have been present for a long time in the C (and C++) standard. While neither the C nor C++ standards do not enforce it, virtually all implementations comply to the <a href="http://en.wikipedia.org/wiki/IEEE_754-2008" target="_blank">IEEE 754</a>&mdash;or try very hard to. In fact, I do not know as of today of an implementation that uses something very different. But the IEEE 754-type floats are aging. GPU started to add extensions such as <a href="http://en.wikipedia.org/wiki/Half_precision" target="_blank">short floats</a> for evident reasons. Should we start considering adding new types on both ends of the spectrum?</p>
<p><a href="http://en.wikipedia.org/wiki/File:Elephant_skeleton.jpg"><img src="http://hbfs.files.wordpress.com/2012/01/elephant-small.jpg?w=450" alt="" title="elephant-small"   class="aligncenter size-full wp-image-3949" /></a></p>
<p>The next step up, the <a href="http://en.wikipedia.org/wiki/Quadruple_precision_floating-point_format" target="_blank">quadruple precision float</a>, is already part of the standard, but, as far as I know, not implemented anywhere. Intel x86 does have something in between for its internal float format on 80 bits, the so-called <a href="http://en.wikipedia.org/wiki/Extended_precision" target="_blank">extended precision</a>, but it&#8217;s not really standard as it is not sanctioned by the IEEE standards, and, generally speaking, and surprisingly enough, not really supported well by the instruction set. It&#8217;s sometimes supported by the <tt>long double</tt> C type. But, anyway, what&#8217;s in a floating point number?</p>
<p><span id="more-3943"></span></p>
<p>Let us take the 32-bits <tt>float</tt> as an example. There&#8217;s one bit assigned to the sign (therefore it&#8217;s not a <a href="http://en.wikipedia.org/wiki/Two%27s_complement" target="_blank">2s complement representation</a>), 8 bits for the exponent, representing values of -128 to 127 (with a bias of +127), and 23 bits for the mantissa, the &#8220;precision bits&#8221; (with the most significant one removed, it allows one more low-weight bit to be shoved in): That is, it looks like:</p>
<div id="attachment_3946" class="wp-caption aligncenter" style="width: 460px"><a href="http://en.wikipedia.org/wiki/File:Float_example.svg"><img src="http://hbfs.files.wordpress.com/2012/01/500px-float_example-svg.png?w=450&#038;h=57" alt="" title="500px-Float_example.svg" width="450" height="57" class="size-full wp-image-3946" /></a><p class="wp-caption-text">(From Wikipedia)</p></div>
<p>and the value is given by</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%28-1%29%5Es+2%5E%7Be-127%7D+%281%2B%5Csum_%7Bi%3D1%7D%5E%7B23%7D+2%5E%7B-i%7Df_i%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='(-1)^s 2^{e-127} (1+&#92;sum_{i=1}^{23} 2^{-i}f_i)' title='(-1)^s 2^{e-127} (1+&#92;sum_{i=1}^{23} 2^{-i}f_i)' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=s&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='s' title='s' class='latex' /> is the sign, <img src='http://s0.wp.com/latex.php?latex=e&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='e' title='e' class='latex' /> the exponent as stored in the <tt>float</tt>, and <img src='http://s0.wp.com/latex.php?latex=%5C%7Bf_i%5C%7D_%7Bi%3D1%7D%5E23&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;{f_i&#92;}_{i=1}^23' title='&#92;{f_i&#92;}_{i=1}^23' class='latex' /> are the bits of the mantissa, from left to right in the above picture.</p>
<p>The <tt>double</tt> uses one sign bit, 11 exponent bits, and 52 (+1 virtual) bits for precision, for a total of 64 bits. The <tt>quadruple</tt> (which is not a C nor a C++ keyword, unfortunately) uses one sign bit, 15 exponent bits, and 112 (+1 virtual) bits for precision. Its precision is about 34 significant digits. And hypothetical <tt>octuple</tt> would eat up 256 bits, 32 bytes, probably with something like 24 or 32 bits for the exponent, leaving 224 bits of precision (that&#8217;s about <img src='http://s0.wp.com/latex.php?latex=log_%7B10%7D2%5E%7B225%7D%5Capprox+67.7&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='log_{10}2^{225}&#92;approx 67.7' title='log_{10}2^{225}&#92;approx 67.7' class='latex' /> significant digits (225 because of the virtual leading bit)). That&#8217;s significantly more precise than either <tt>float</tt> or <tt>double</tt>, but it would also ask significantly bigger <a href="http://en.wikipedia.org/wiki/Floating-point_unit" target="_blank">FPU</a>s to deal with these formats!</p>
<p>Generalizing these formats to smaller floats is also straightforward, but we now face serious lack of precision. The <a href="http://en.wikipedia.org/wiki/Half_precision_floating-point_format" target="_blank">half float</a>, on 16 bits, uses one sign bit, 5 exponent bits (with a bias of 15), and 10 (+1 virtual) precision bits.The range is evidently smaller (from <img src='http://s0.wp.com/latex.php?latex=2%5E%7B-14%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2^{-14}' title='2^{-14}' class='latex' /> to <img src='http://s0.wp.com/latex.php?latex=%282-2%5E%7B-10%7D%292%5E%7B15%7D%3D65504&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='(2-2^{-10})2^{15}=65504' title='(2-2^{-10})2^{15}=65504' class='latex' />) but the steps are rather large (all things considered) at <img src='http://s0.wp.com/latex.php?latex=2%5E%7B-15%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2^{-15}' title='2^{-15}' class='latex' />. That&#8217;s about 3 significant digits. A byte-size floating point number doesn&#8217;t seem to make all that much sense.</p>
<p>The half-float format makes sense only if you&#8217;re interested in saving storage space, or gaining computation speed, or minimizing hardware, at the expense of precision&mdash;you can&#8217;t have all three. But could we do better with 16 bits than this? Well, since 16 bits aren&#8217;t that much, the first thing that comes to mind is that we could just <em>learn</em> a 64k-entry table filled with higher-precision floats optimized for the precision of the computation at hand. This is in fact a quantization problem where the objective function is not directly the mean square error to the prototypes (as would be <a href="http://en.wikipedia.org/wiki/Quantization_%28signal_processing%29#Scalar_quantization" target="_blank">scalar quantization</a> optimized over a probability distribution) but the squared error between the quantized computation and the distribution of the &#8220;real&#8221; result using high-precision arithmetic everywhere. We know have a look-up for each value, but the table may fit in cache, and we could use rather large computations (that do not fit in cache).</p>
<p align="center">*<br />*&emsp;*</p>
<p>The idea generalizes to other problems. In instruction encoding, one usually decide how to allocate bits within an instruction word to encode the operations to be performed, the type of the operands (immediate, register, memory reference) and the operands. In simple CPUs, like the <a href="http://en.wikipedia.org/wiki/Zilog_Z80" target="_blank">Z80</a> (no, I&#8217;m not <em>that</em> old, I know of it because I used to to embedded programming a lot) the instructions are really crude and the opcode fits, always, on one byte.</p>
<p>The instruction set is hierarchical, with groups of prefixes for similar instructions, probably to ease decoding, but that&#8217;s not necessary to do it like this. If you&#8217;re going to have <a href="http://en.wikipedia.org/wiki/Microcode" target="_blank">microcode</a> anyway, the instruction encoding itself is essentially irrelevant: it becomes merely a pointer in the microcode table, and you can assign pretty much every instruction you want to each of your 8-bits (or 16-bits) opcode, without worrying about having a correct prefix structure or enough bits to encode all possible/interesting registers. The look-up happens inside the CPU, at a very deep level in the micro-architecture, and so is not really concerned by the speed discrepancy between the CPU and the memory.</p>
<p>If you want it to be cute, you can enforce a couple of special instructions to have a specific opcode (say <tt>0x00</tt> puts the CPU on <a href="http://en.wikipedia.org/wiki/HLT" target="_blank">HALT</a> (and no, not necessarily, <a href="http://en.wikipedia.org/wiki/Halt_and_Catch_Fire" target="_blank">and catch fire</a>), <tt>0x01</tt> for an interrupt, etc.). You could even think of reconfiguring instruction set and microcode on the fly. Maybe a by-process setting?</p>
<p align="center">*<br />*&emsp;*</p>
<p>Extending significantly the precision of the usual float formats, say, moving to 128 and 256 bits, poses many problems. While it is undeniable that we <em>will</em> need those eventually for scientific computing, it means some profound changes in FPU architectures. First, we will need wider data buses. That, we already partly have, with GPUs and newer CPUs. A 32-byte long float is smaller than a cache-line, so on that side, that&#8217;s not too bad. But what about the hardware that actually computes operations between big floats?</p>
<p>Well, first, I think we all agree on the fact that if we are going forward with these formats, we will not accept painfully slow implementations. This means that the hardware will expand significantly. I would suspect quadratically in the number of input bits in general, while it&#8217;s still open for debate how low the circuit complexity <em>can</em> get&nbsp;[<a href="#1">1</a>].</p>
<p align="center">*<br />*&emsp;*</p>
<p>The idea that we should learn representations rather than designing them by hand is central to <a href="http://en.wikipedia.org/wiki/Data_compression" target="_blank">data compression</a> and <a href="http://en.wikipedia.org/wiki/Machine_learning" target="_blank">machine-learning</a> (which, in fact, can be unified at a rather deep level). In both cases, we want to extract information from the data with as little a priori knowledge as possible and let the algorithms find a representation (or, more accurately, a <em>re</em>-representation) of the data so that there are exploitable structures for further analysis stages. The criteria for &#8220;exploitable&#8221; may vary. In data compression, you want a representation for which you can find a short code (i.e., for which the <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29" target="_blank">entropy</a> is minimized); in machine-learning it may be that you want to minimize prediction error; finding a short code and formulating an accurate prediction are not exactly the same thing but they are very similar.</p>
<hr width="30%" align="left">
<p>[<a name="1">1</a>] Amos R. Omondi &mdash; <i>Computer Arithmetic Systems: Algorithms, Architecture, and Implementation</i> &mdash; Prentice Hall,1994</p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/algorithms/'>algorithms</a>, <a href='http://hbfs.wordpress.com/category/c/'>C</a>, <a href='http://hbfs.wordpress.com/category/c-plus-plus/'>C-plus-plus</a>, <a href='http://hbfs.wordpress.com/category/data-compression/'>data compression</a>, <a href='http://hbfs.wordpress.com/category/data-structures/'>data structures</a>, <a href='http://hbfs.wordpress.com/category/hacks/'>hacks</a>, <a href='http://hbfs.wordpress.com/category/machine-learning/'>machine learning</a>, <a href='http://hbfs.wordpress.com/category/programming/'>programming</a> Tagged: <a href='http://hbfs.wordpress.com/tag/double/'>double</a>, <a href='http://hbfs.wordpress.com/tag/extended/'>extended</a>, <a href='http://hbfs.wordpress.com/tag/float/'>float</a>, <a href='http://hbfs.wordpress.com/tag/ieee-754/'>IEEE 754</a>, <a href='http://hbfs.wordpress.com/tag/octuple/'>octuple</a>, <a href='http://hbfs.wordpress.com/tag/quadruple/'>quadruple</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3943/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3943/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3943/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3943/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3943/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3943/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3943/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3943/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3943/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3943/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3943/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3943/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3943/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3943/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3943&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/31/random-musings-on-floats-and-encodings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/elephant-small.jpg" medium="image">
			<media:title type="html">elephant-small</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/500px-float_example-svg.png" medium="image">
			<media:title type="html">500px-Float_example.svg</media:title>
		</media:content>
	</item>
		<item>
		<title>Medians (Part III)</title>
		<link>http://hbfs.wordpress.com/2012/01/24/medians-part-iii/</link>
		<comments>http://hbfs.wordpress.com/2012/01/24/medians-part-iii/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 15:31:21 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[C-plus-plus]]></category>
		<category><![CDATA[data structures]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[heap]]></category>
		<category><![CDATA[max-heap]]></category>
		<category><![CDATA[med-heap]]></category>
		<category><![CDATA[median]]></category>
		<category><![CDATA[min-heap]]></category>
		<category><![CDATA[selection]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3826</guid>
		<description><![CDATA[So in the two previous parts of this series, we have looked at the selection algorithm and at sorting networks for determining efficiently the (sample) median of a series of values. In this last installment of the series, I consider an efficient (but approximate) algorithm based on heaps to compute the median. A heap is [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3826&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>So in the <a href="" target="_blank">two</a> <a href="" target="_blank">previous</a> parts of this series, we have looked at the selection algorithm and at sorting networks for determining efficiently the (sample) median of a series of values.</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg"><img src="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg?w=450" alt="" title="split-rock-small"   class="aligncenter size-full wp-image-3812" /></a></p>
<p>In this last installment of the series, I consider an efficient (but approximate) algorithm based on heaps to compute the median.</p>
<p><span id="more-3826"></span></p>
<p>A <a href="http://en.wikipedia.org/wiki/Heap_%28data_structure%29" target="_blank">heap</a> is an efficient tree-like data structure used to maintain, for example, <a href="http://en.wikipedia.org/wiki/Priority_queue" target="_blank">priority queues</a>. The basic invariant of a max-heap (where we are interested in knowing the largest value; there&#8217;s also a min-heap where we want to know the smallest value) is that, unless a leaf, an internal node contain a key that is larger than both its children&#8217;s keys. If this invariant is respected through all of the heap, then the root contains the maximum value contained in the heap (and, respectively for a min-heap, the minimum value). A max-heap would look something like this:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/max-heap.png"><img src="http://hbfs.files.wordpress.com/2011/12/max-heap.png?w=300&#038;h=146" alt="" title="max-heap" width="300" height="146" class="aligncenter size-medium wp-image-3829" /></a></p>
<p>The best part is that you can make any array a heap in linear time. But, how does that help us for the median? Well, we could think of a med-heap, where, at every node (that is not a leaf), the invariant is that the node has a key that is the median amongst itself and its two children! A (very) basic med-heap class would look something like:</p>
<p><pre class="brush: cpp;">
template &lt;typename T&gt;
void cmpexchg(T &amp; a, T &amp; b) 
 { 
  using std::swap; // ADL safe
  if (a&gt;b) swap(a,b); 
 }

template &lt;typename T&gt;
const T &amp; median3( T &amp; a, T &amp; b, T &amp; c )
 {
  cmpexchg(a,c);
  cmpexchg(a,b);
  cmpexchg(b,c);
  return b;
 }

template &lt;typename T&gt;
class med_heap
 {
 private:

  std::vector&lt;T&gt; &amp; heap;

  int left_child(int i) { return 2*i+1; }
  int right_child(int i) { return 2*i+2; }
 
  void heapify()
   {
    for (int current=heap.size()/2-1;current&gt;-1;current--)
     {
      int left=left_child(current);
      int right=right_child(current);

      // check if it has two children
      // (otherwise give up)
      //
      if ( (left &lt; heap.size()) &amp;&amp; 
           (right &lt; heap.size()))
       {
        T &amp; a = heap[current];
        T &amp; b = heap[left];
        T &amp; c = heap[right];

        const T &amp; med = median3(a,b,c);

        if (b==med)
         std::swap(a,b);
        else
         if (c==med)
          std::swap(a,c);
         else
          ; // a is already the median
       }
      else
       ; // has only one child, so
         // already &quot;median&quot;
     }
   }

 public:

  size_t size() const { return heap.size(); }

  // lets you peek at the next
  const T &amp; median() const
   {
    return heap[0];
   }


  med_heap(std::vector&lt;T&gt; &amp; v)
   : heap(v)
   {
    heapify();
   }
 };
</pre></p>
<p>Note that the class does not copy the vector, it references an already existing one (this not only avoids computing the time for allocation and copy, it is also fair to <tt>select</tt>, as it also only uses a reference to a vector).</p>
<p>So here the magic happens in <tt>heapify()</tt>. Using the addressing described <a href="http://hbfs.wordpress.com/2009/04/07/compact-tree-storage/" target="_blank">in a previous post</a>, it becomes simple to scan the array from the middle backwards to the beginning and enforce at each step the invariant. This takes &#8220;small&#8221; linear time, because at each entry, we only need to examine three values.</p>
<p>The problem is that, the median of median is not the median of the whole data, and that there will be imprecision in exchange of speed. That may be a good trade-off, as we will see.<br /><Br></p>
<p align="center">*<br />*&emsp;*</p>
<p>OK, the real contenders so far are <tt>select</tt> and the med-heap. We will compare to the <tt>stl::sort</tt> algorithm since is a <em>bona fide</em> comparison as 1) simple to use and 2) a likely solution for someone not wanting to reinvent the wheel (or not knowing about selection). The following shows times to find the median in an array of 1000 entries (with 1000 instances of the problem, the same for all three methods), with values on 0&#8230;9999:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/times.png"><img src="http://hbfs.files.wordpress.com/2011/12/times.png?w=300&#038;h=252" alt="" title="times" width="300" height="252" class="aligncenter size-medium wp-image-3830" /></a></p>
<p>From the <a href="http://en.wikipedia.org/wiki/Box_plot" target="_blank">box plots</a>, we see that the <tt>stl::sort</tt> fares worse (in fact a lot worse) than the two other alternatives. The med-heap is also significantly faster than <tt>select</tt>, 2-3&times; faster.</p>
<p>What about accuracy? Looking at the distribution of the errors, we see that 50% of the times, the relative error is within ±5%, and 95% of the time it is less than ±20%. This is distribution seems to be rather indifferent to the range of the values, for example, with a range of 0&#8230;255, you get more or less the same results.</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/errors.png"><img src="http://hbfs.files.wordpress.com/2011/12/errors.png?w=152&#038;h=300" alt="" title="errors" width="152" height="300" class="aligncenter size-medium wp-image-3831" /></a></p>
<p align="center">*<br />*&emsp;*</p>
<p>In essence, therefore, you trade off accuracy (±5% error 50% of the time) for a rather interesting speed-up (2-3&times;) over an exact algorithm such as <tt>select</tt>, which is an interesting results on its own. Now, whether or not a med-heap is adequate for your needs is another story. You could argue that for some applications, it is necessary to have the exact algorithm, and you could make the case where, for another application, the ±5% error 50% of the time is unimportant or unnoticeable.</p>
<p align="center">*<br />*&emsp;*</p>
<p>Here, we only have a rather sketchy implementation of a med-heap that provides only the fun part as far as knowing the median is concerned, but we could just as easily as with a min- or max-heap, provide the necessary functions to pop the median, insert, and remove, values from the med-heap. In fact, it would be exactly the same code as with a min- or max-heap, but for the median instead of min or max.</p>
<p align="center">*<br />*&emsp;*</p>
<p>Full Test Code <a href="http://www.stevenpigeon.org/blogs/hbfs/med-heap.cpp" target="_blank">here</a>.</p>
<p align="center">*<br />*&emsp;*</p>
<p>This is the 300th post.</p>
<p align="center">*<br />*&emsp;*</p>
<p>The STL function <tt>nth_element</tt> seems to be doing a good job at select (and a better one than me), but is still much slower than the med-heap:</p>
<p><a href="http://hbfs.files.wordpress.com/2012/01/times-with-nth.png"><img src="http://hbfs.files.wordpress.com/2012/01/times-with-nth.png?w=300&#038;h=252" alt="" title="times-with-nth" width="300" height="252" class="aligncenter size-medium wp-image-3938" /></a></p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/algorithms/'>algorithms</a>, <a href='http://hbfs.wordpress.com/category/c/'>C</a>, <a href='http://hbfs.wordpress.com/category/c-plus-plus/'>C-plus-plus</a>, <a href='http://hbfs.wordpress.com/category/data-structures/'>data structures</a>, <a href='http://hbfs.wordpress.com/category/hacks/'>hacks</a>, <a href='http://hbfs.wordpress.com/category/programming/'>programming</a> Tagged: <a href='http://hbfs.wordpress.com/tag/heap/'>heap</a>, <a href='http://hbfs.wordpress.com/tag/max-heap/'>max-heap</a>, <a href='http://hbfs.wordpress.com/tag/med-heap/'>med-heap</a>, <a href='http://hbfs.wordpress.com/tag/median/'>median</a>, <a href='http://hbfs.wordpress.com/tag/min-heap/'>min-heap</a>, <a href='http://hbfs.wordpress.com/tag/selection/'>selection</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3826/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3826/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3826/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3826&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/24/medians-part-iii/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg" medium="image">
			<media:title type="html">split-rock-small</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/max-heap.png?w=300" medium="image">
			<media:title type="html">max-heap</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/times.png?w=300" medium="image">
			<media:title type="html">times</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/errors.png?w=152" medium="image">
			<media:title type="html">errors</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/times-with-nth.png?w=300" medium="image">
			<media:title type="html">times-with-nth</media:title>
		</media:content>
	</item>
		<item>
		<title>Wallpaper: Frontières imaginées</title>
		<link>http://hbfs.wordpress.com/2012/01/21/wallpaper-frontieres-imaginees/</link>
		<comments>http://hbfs.wordpress.com/2012/01/21/wallpaper-frontieres-imaginees/#comments</comments>
		<pubDate>Sun, 22 Jan 2012 03:06:04 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[Wallpapers]]></category>
		<category><![CDATA[Zen]]></category>
		<category><![CDATA[wallpaper]]></category>
		<category><![CDATA[wallpapers]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3931</guid>
		<description><![CDATA[You can find more wallpapers here Filed under: Wallpapers, Zen Tagged: wallpaper, wallpapers<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3931&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div id="attachment_3932" class="wp-caption aligncenter" style="width: 310px"><a href="http://hbfs.files.wordpress.com/2012/01/0105.jpg"><img src="http://hbfs.files.wordpress.com/2012/01/0105.jpg?w=300&#038;h=187" alt="" title="0105" width="300" height="187" class="size-medium wp-image-3932" /></a><p class="wp-caption-text">(Frontières imaginées, 1920×1200)</p></div>
<p>You can find more wallpapers <a href="http://www.stevenpigeon.org/Wallpapers/index.html" target="_blank">here</a></p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/wallpapers/'>Wallpapers</a>, <a href='http://hbfs.wordpress.com/category/zen/'>Zen</a> Tagged: <a href='http://hbfs.wordpress.com/tag/wallpaper/'>wallpaper</a>, <a href='http://hbfs.wordpress.com/tag/wallpapers-2/'>wallpapers</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3931/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3931/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3931/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3931&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/21/wallpaper-frontieres-imaginees/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/0105.jpg?w=300" medium="image">
			<media:title type="html">0105</media:title>
		</media:content>
	</item>
		<item>
		<title>Wallpaper: Sibérie minimaliste</title>
		<link>http://hbfs.wordpress.com/2012/01/21/wallpaper-siberie-minimaliste/</link>
		<comments>http://hbfs.wordpress.com/2012/01/21/wallpaper-siberie-minimaliste/#comments</comments>
		<pubDate>Sun, 22 Jan 2012 03:03:56 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[Wallpapers]]></category>
		<category><![CDATA[Zen]]></category>
		<category><![CDATA[wallpaper]]></category>
		<category><![CDATA[wallpapers]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3925</guid>
		<description><![CDATA[You can find more wallpapers here Filed under: Wallpapers, Zen Tagged: wallpaper, wallpapers<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3925&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div id="attachment_3926" class="wp-caption aligncenter" style="width: 310px"><a href="http://hbfs.files.wordpress.com/2012/01/0106.jpg"><img src="http://hbfs.files.wordpress.com/2012/01/0106.jpg?w=300&#038;h=187" alt="" title="0106" width="300" height="187" class="size-medium wp-image-3926" /></a><p class="wp-caption-text">(Sibérie Minimaliste, 1920×1200)</p></div>
<p>You can find more wallpapers <a href="http://www.stevenpigeon.org/Wallpapers/index.html" target="_blank">here</a></p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/wallpapers/'>Wallpapers</a>, <a href='http://hbfs.wordpress.com/category/zen/'>Zen</a> Tagged: <a href='http://hbfs.wordpress.com/tag/wallpaper/'>wallpaper</a>, <a href='http://hbfs.wordpress.com/tag/wallpapers-2/'>wallpapers</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3925/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3925/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3925/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3925&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/21/wallpaper-siberie-minimaliste/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/0106.jpg?w=300" medium="image">
			<media:title type="html">0106</media:title>
		</media:content>
	</item>
		<item>
		<title>Lossless Coding of CD Audio</title>
		<link>http://hbfs.wordpress.com/2012/01/17/lossless-coding-of-cd-audio/</link>
		<comments>http://hbfs.wordpress.com/2012/01/17/lossless-coding-of-cd-audio/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 15:18:57 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[Bash (Shell)]]></category>
		<category><![CDATA[data compression]]></category>
		<category><![CDATA[embedded programming]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[CD]]></category>
		<category><![CDATA[Entropy]]></category>
		<category><![CDATA[Golomb Codes]]></category>
		<category><![CDATA[GSM]]></category>
		<category><![CDATA[Linear Prediction]]></category>
		<category><![CDATA[music]]></category>
		<category><![CDATA[psychoacoustic model]]></category>
		<category><![CDATA[psychoacoustics]]></category>
		<category><![CDATA[sound]]></category>
		<category><![CDATA[Speech]]></category>
		<category><![CDATA[Speech Codec]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3872</guid>
		<description><![CDATA[Once upon a time, I discussed how to pick bit-rate for MP3, while considering re-ripping all my CDs. But if I&#8217;m to re-rip everything, I might as well do it one last time and use lossless compression. In this post, we&#8217;ll discuss the simple script I cooked up to do just that, and a bit [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3872&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Once upon a time, I discussed how to <a href="http://hbfs.wordpress.com/2010/04/06/picking-a-bit-rate-for-mp3-files/" target="_blank">pick bit-rate</a> for MP3, while considering re-ripping all my CDs. But if I&#8217;m to re-rip everything, I might as well do it <em>one last time</em> and use <a href="http://en.wikipedia.org/wiki/Lossless_data_compression" target="_blank">lossless compression</a>.</p>
<p><a href="http://hbfs.files.wordpress.com/2010/02/gramophone_19142.png"><img src="http://hbfs.files.wordpress.com/2010/02/gramophone_19142.png?w=121&#038;h=150" alt="" title="Gramophone_1914" width="121" height="150" class="aligncenter size-thumbnail wp-image-2165" /></a></p>
<p>In this post, we&#8217;ll discuss the simple script I cooked up to do just that, and a bit on how <a href="http://en.wikipedia.org/wiki/FLAC" target="_blank">Flac</a> works, and how it compares to MP3.</p>
<p><span id="more-3872"></span></p>
<p>First, to run the script, you will need <tt>cdparanoia</tt> and <tt>flac</tt>, both packages readily available from Debian/Ubuntu repositories (and, for the MP3 version of the script, found at the end of this post, you will need <tt>lame</tt>, also found in the default repositories).</p>
<p>The first thing to do is to probe the CD for its Table Of Contents (TOC hereafter). Fortunately, <tt>cdparanoia</tt> is a fine piece of software and it detects the default/loaded CD/DVD drive. It suffice to invoke it with the probe option like this</p>
<p><pre class="brush: bash;">
echo probing...

if cdparanoia -Q 2&gt; cd-rip-out.tmp~
then
...
</pre></p>
<p>To find the drive and fetch the TOC, here written to a temporary file. Redirection from <tt>stderr</tt> is needed because <tt>cdparanoia</tt> outputs to <tt>stderr</tt> (for some reason). This temporary file is parsed to show the tracks and get their numbers:</p>
<p><pre class="brush: bash;">
grep -e '^[[:blank:]]*[0-9]*\.' cd-rip-out.tmp~ &gt; cd-rip-toc.tmp~
tracks=$( cut -d. -f 1 cd-rip-toc.tmp~ )
</pre></p>
<p>(The exact <a href="http://en.wikipedia.org/wiki/Poutine" target="_blank">poutine</a> is <tt>cd-paranoia</tt>-specific), but in short, it gets the track number and forms a list such as <tt>1 2 3 4</tt>. The user is also, later on, able to either chose the default list or specify his own, including <tt>cdparanoia</tt>-specific syntax such as <tt>12-15</tt> to grab tracks 12 to 15, inclusive, as a single track. We also add a couple of cosmetic details such as promoting track numbers from <tt>1</tt> to <tt>001</tt> (so that they all line up all pretty), ask for the track title, and invoke <tt>flac</tt>:</p>
<p><pre class="brush: bash; wrap-lines: false;">
    # main grab loop
    for track in ${track_list[@]}
    do
        ((t++))
        # sed hack from Vincent &quot;gnuvince&quot; Foley
        beautiful_track=$(echo $track \
        | sed 's/\([0-9]*\)/000\1/g;s/[0-9]*\([0-9]\{3\}\)/\1/g')

        read -p &quot;tracks $beautiful_track ($t/$t_l) title: &quot; this_track_title 
        this_track_name=&quot;$artist -- $album - $beautiful_track - $this_track_title&quot;

        if cdparanoia --never-skip=2 $track -w &quot;$this_track_name.wav&quot;
        then
            # grab successful!

            flac --best \
                &quot;$this_track_name.wav&quot; \
                --tag=artist=&quot;$artist&quot; \
                --tag=album=&quot;$album&quot; \
                --tag=track=&quot;$beautiful_track&quot; \
                --tag=title=&quot;$beautiful_track $this_track_title&quot; \
                --tag=year=&quot;$year&quot;
        else
            echo An error occurred while grabbing $track
        fi
    done
</pre></p>
<p>(The ugly sed hack to convert tracks number from 1 to 001 is due to <a href="http://gnuvince.wordpress.com/" target="_blank">gnuvince</a>).</p>
<p>Flac provides many options, but it seems that <tt>--best</tt> does indeed provide very good parameters, choosing the best compression settings regardless of CPU time (and, let us be serious, it&#8217;s rather unimportant that it takes &#8220;more&#8221; CPU time when it takes about 30s to compress an entire CD on a modern CPU&mdash;once the sound is ripped from the CD).</p>
<p align="center">*<br />*&emsp;*</p>
<p>Flac encodes sound losslessly, that is, if you decompress a <tt>.flac</tt> file, you get, bit for bit, the <tt>.wav</tt> you input. MP3, on the other hand, is a conspicuously lossy encoder, that is, it <em>destroys</em> information in order to achieve much higher compression ratios. One essential piece of a modern MP3 encoder is its psycho-acoustic model, an engine that analyzes sound and determines, with quantization enabled, what will be the perceived loss incurring from destroying this or that feature in the sound. Needless to say, the better the model is, the better sound quality can be because the encoder will be able to take the right decisions about what information to destroy&mdash;destroy sound information that you should hear the least.</p>
<p>But at very low bit-rates, it&#8217;s the quantization/decimation engine that dominates. To meet the very low bit-rates, the encoder will destroy plenty of information, and even if the psycho-acoustic model helps it take the right decisions while coding, the result will still be a sound with lots of artifacts, and it will not sound that good. Effects will be noticeable or even irritating. At high bit-rates, however, it will be the psycho-acoustic model that will dominate. At high bit-rate, there is no need to fiercely quantize and decimate, but the psycho-acoustic model can still force the encoder to remove features in the sound that are deemed inaudible (according to the model engine) <em>despite having enough bits available to code them</em>. In this regime, cranking the bit-rate will result only in marginal file size increase, because what extra bits you throw at the encoding are lost to the model that removes features from the sound.</p>
<p>We have seen this before, haven&#8217;t we?</p>
<div id="attachment_2168" class="wp-caption aligncenter" style="width: 160px"><a href="http://hbfs.files.wordpress.com/2010/02/relative-filesizes.png"><img src="http://hbfs.files.wordpress.com/2010/02/relative-filesizes.png?w=150&#038;h=94" alt="" title="relative-filesizes" width="150" height="94" class="size-thumbnail wp-image-2168" /></a><p class="wp-caption-text">Relative file sizes for --vbr-new</p></div>
<p>In the graph, we kind of see that some files do not grow as bit-rate increase, others grow only somewhat; at least, quite sub-linearly.</p>
<p align="center">*<br />*&emsp;*</p>
<p>With lossless coding, we keep <em>everything</em>, so it is crucial that we have a good prediction model, that is, an engine that can give a good prediction on the next sample <img src='http://s0.wp.com/latex.php?latex=x_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_t' title='x_t' class='latex' /> from the previous <img src='http://s0.wp.com/latex.php?latex=w&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='w' title='w' class='latex' /> samples, <img src='http://s0.wp.com/latex.php?latex=x_%7Bt-1%7D%2C+x_%7Bt-2%7D%2C+%5Cldots%2C+x_%7Bt-w%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_{t-1}, x_{t-2}, &#92;ldots, x_{t-w}' title='x_{t-1}, x_{t-2}, &#92;ldots, x_{t-w}' class='latex' />. If the error <img src='http://s0.wp.com/latex.php?latex=%5Chat%7Bx%7D_t-x_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;hat{x}_t-x_t' title='&#92;hat{x}_t-x_t' class='latex' /> is (<a href="http://en.wikipedia.org/wiki/Expected_value" target="_blank">expectedly</a>) small (well, to be exact, if it has a small <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29" target="_blank">entropy</a>), then we can devise an efficient code for it and get good compression.</p>
<p>Flac does not use an elaborate sound model. It uses <a href="http://en.wikipedia.org/wiki/Linear_prediction" target="_blank">linear prediction</a> to predict values. Linear prediction can be written as:<br /><Br></p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Chat%7Bx%7D_t+%3D+%5Csum_%7Bi%3D1%7D%5E%7Bw%7Da_i+x_%7Bt-i%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;hat{x}_t = &#92;sum_{i=1}^{w}a_i x_{t-i}' title='&#92;hat{x}_t = &#92;sum_{i=1}^{w}a_i x_{t-i}' class='latex' /></p>
<p>(or, in vector notation, <img src='http://s0.wp.com/latex.php?latex=%5Chat%7Bx%7D_t%3Da%5ETx_%7Bt-1%7D%5E%7Bt-w%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;hat{x}_t=a^Tx_{t-1}^{t-w}' title='&#92;hat{x}_t=a^Tx_{t-1}^{t-w}' class='latex' />&mdash;it <em>is</em> only a dot-product, after all.) At each time <img src='http://s0.wp.com/latex.php?latex=t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='t' title='t' class='latex' />, a new set of values <img src='http://s0.wp.com/latex.php?latex=a_i&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a_i' title='a_i' class='latex' /> are computed in order to minimize the expected error <img src='http://s0.wp.com/latex.php?latex=%28%5Chat%7Bx%7D_%7Bt%2B1%7D-x_%7Bt%2B1%7D%29%5E2&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='(&#92;hat{x}_{t+1}-x_{t+1})^2' title='(&#92;hat{x}_{t+1}-x_{t+1})^2' class='latex' /> (there are possible variants) for the next time step. Turns out it&#8217;s not too intensive if the window is small-ish (especially for somewhat recent CPUs), because if you need to invert a <img src='http://s0.wp.com/latex.php?latex=w%5Ctimes%7Bw%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='w&#92;times{w}' title='w&#92;times{w}' class='latex' /> matrix, it is of a special kind, it is a <a href="http://en.wikipedia.org/wiki/Toeplitz_matrix" target="_blank">Toeplitz Matrix</a>. Because of the structure of the matrix, it as <img src='http://s0.wp.com/latex.php?latex=O%28w%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(w)' title='O(w)' class='latex' /> degrees of freedom rather than <img src='http://s0.wp.com/latex.php?latex=O%28w%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(w^2)' title='O(w^2)' class='latex' />, and the inversion of a Toeplitz matrix (or solving a Toeplitz system; same difference) can be performed in <em>much</em> better than <img src='http://s0.wp.com/latex.php?latex=O%28w%5E3%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(w^3)' title='O(w^3)' class='latex' />: it can be done in <img src='http://s0.wp.com/latex.php?latex=O%28w%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(w^2)' title='O(w^2)' class='latex' />!</p>
<p>Linear Prediction Coding isn&#8217;t something new, it&#8217;s been in use in various <a href="http://en.wikipedia.org/wiki/Speech_coding" target="_blank">speech coding</a> schemes (such as the <a href="http://en.wikipedia.org/wiki/GSM#Voice_codecs" target="_blank">GSM Speech Codecs</a>). What makes it attractive is that, well, it seems to work fairly well on speech and sound, and it has efficient algorithms to solve for the parameters of the linear prediction.</p>
<p align="center">*<br />*&emsp;*</p>
<p>Once the prediction <img src='http://s0.wp.com/latex.php?latex=%5Chat%7Bx%7D_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;hat{x}_t' title='&#92;hat{x}_t' class='latex' /> is done, the difference <img src='http://s0.wp.com/latex.php?latex=x_t-%5Chat%7Bx%7D_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_t-&#92;hat{x}_t' title='x_t-&#92;hat{x}_t' class='latex' /> is coded. In Flac, it seems that this difference is considered to be <a href="http://en.wikipedia.org/wiki/Geometric_distribution" target="_blank">geometrically distributed</a>, and accordingly encoded using a <a href="http://en.wikipedia.org/wiki/Golomb_coding" target="_blank">Golomb Code</a>. It is the <em>exact</em> encoding the residual <img src='http://s0.wp.com/latex.php?latex=e_t%3Dx_t-%5Chat%7Bx%7D_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='e_t=x_t-&#92;hat{x}_t' title='e_t=x_t-&#92;hat{x}_t' class='latex' /> that makes the recovery of the original signal: indeed, <img src='http://s0.wp.com/latex.php?latex=x_t%3D%5Chat%7Bx%7D_t%2Be_t&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='x_t=&#92;hat{x}_t+e_t' title='x_t=&#92;hat{x}_t+e_t' class='latex' />.</p>
<p align="center">*<br />*&emsp;*</p>
<p>You can download the full scripts <a href="http://www.stevenpigeon.org/blogs/hbfs/cd-rip.sh" target="_blank">here for Flac</a> and <a href="http://www.stevenpigeon.org/blogs/hbfs/cd-rip-mp3.sh" target="_blank">here for MP3</a> high bit-rate VBR. Feel free to tinker with them; that&#8217;s what they&#8217;re for.</p>
<p align="center">*<br />*&emsp;*</p>
<p>The <a href="http://flac.sourceforge.net/" target="_blank">Flac</a> homepage is on sourceforge&#8230; Follow the link to know more about Flac.</p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/algorithms/'>algorithms</a>, <a href='http://hbfs.wordpress.com/category/bash-shell/'>Bash (Shell)</a>, <a href='http://hbfs.wordpress.com/category/data-compression/'>data compression</a>, <a href='http://hbfs.wordpress.com/category/embedded-programming/'>embedded programming</a>, <a href='http://hbfs.wordpress.com/category/mathematics/'>Mathematics</a>, <a href='http://hbfs.wordpress.com/category/programming/'>programming</a> Tagged: <a href='http://hbfs.wordpress.com/tag/cd/'>CD</a>, <a href='http://hbfs.wordpress.com/tag/entropy/'>Entropy</a>, <a href='http://hbfs.wordpress.com/tag/golomb-codes/'>Golomb Codes</a>, <a href='http://hbfs.wordpress.com/tag/gsm/'>GSM</a>, <a href='http://hbfs.wordpress.com/tag/linear-prediction/'>Linear Prediction</a>, <a href='http://hbfs.wordpress.com/tag/music/'>music</a>, <a href='http://hbfs.wordpress.com/tag/psychoacoustic-model/'>psychoacoustic model</a>, <a href='http://hbfs.wordpress.com/tag/psychoacoustics/'>psychoacoustics</a>, <a href='http://hbfs.wordpress.com/tag/sound/'>sound</a>, <a href='http://hbfs.wordpress.com/tag/speech/'>Speech</a>, <a href='http://hbfs.wordpress.com/tag/speech-codec/'>Speech Codec</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3872/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3872&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/17/lossless-coding-of-cd-audio/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2010/02/gramophone_19142.png?w=121" medium="image">
			<media:title type="html">Gramophone_1914</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2010/02/relative-filesizes.png?w=150" medium="image">
			<media:title type="html">relative-filesizes</media:title>
		</media:content>
	</item>
		<item>
		<title>Suggested Reading: National Audubon Society Guide To Photographing National Parks</title>
		<link>http://hbfs.wordpress.com/2012/01/15/suggested-reading-national-audubon-society-guite-to-photographing-national-parks/</link>
		<comments>http://hbfs.wordpress.com/2012/01/15/suggested-reading-national-audubon-society-guite-to-photographing-national-parks/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 04:37:34 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[Suggested Reading]]></category>
		<category><![CDATA[Audubon]]></category>
		<category><![CDATA[Landscape]]></category>
		<category><![CDATA[National Audubon]]></category>
		<category><![CDATA[National Parks]]></category>
		<category><![CDATA[Photography]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3911</guid>
		<description><![CDATA[Tim Fitzharris &#8212;&#160;National Audubon Society Guide To Photographing National Parks (Digital Edition)&#160;&#8212; Firefly Books 2009, 192 pp. ISBN&#160;978-1-55407-455-6 Contrary to what the title may indicate, this book has little to do with actual digital photography techniques for landscapes or natural wonders; it is rather a tourist guide of everything you should see in the major [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3911&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Tim Fitzharris &mdash;&nbsp;<a href="http://www.amazon.com/gp/product/155407455X/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=155407455X" target="_blank"><i>National Audubon Society Guide To Photographing National Parks (Digital Edition)</i></a>&nbsp;&mdash; Firefly Books 2009, 192 pp. ISBN&nbsp;978-1-55407-455-6</span></p>
<div id="attachment_3914" class="wp-caption aligncenter" style="width: 150px"><a href="http://www.amazon.com/gp/product/155407455X/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=155407455X"><img src="http://hbfs.files.wordpress.com/2012/01/audubon-national-parks.jpg?w=450" alt="" title="audubon-national-parks"   class="size-full wp-image-3914" /></a><p class="wp-caption-text">(Buy At Amazon.com)</p></div>
<p>Contrary to what the title may indicate, this book has little to do with actual digital photography techniques for landscapes or natural wonders; it is rather a tourist guide of everything you should see in the major american national parks. After expediting the basics of landscape photography, the guide leads you into a detailed journey into the United States major national parks, giving you all the good hints as to how to reach this-or-that natural wonder, what time of year, or even the time of day will lend itself the best for photography.</p>
<p>While this seems moderately interesting for a foreigner (especially if one doesn&#8217;t really want to visit all the state parks), the book is entirely redeemed by Fitzharris&#8217; <em>absolutely superb</em> photography. A must get, especially if you&#8217;re interested in landscape photography.</p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/suggested-reading/'>Suggested Reading</a> Tagged: <a href='http://hbfs.wordpress.com/tag/audubon/'>Audubon</a>, <a href='http://hbfs.wordpress.com/tag/landscape/'>Landscape</a>, <a href='http://hbfs.wordpress.com/tag/national-audubon/'>National Audubon</a>, <a href='http://hbfs.wordpress.com/tag/national-parks/'>National Parks</a>, <a href='http://hbfs.wordpress.com/tag/photography/'>Photography</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3911/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3911/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3911/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3911&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/15/suggested-reading-national-audubon-society-guite-to-photographing-national-parks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/audubon-national-parks.jpg" medium="image">
			<media:title type="html">audubon-national-parks</media:title>
		</media:content>
	</item>
		<item>
		<title>Suggested Reading: The Practical Pyromaniac</title>
		<link>http://hbfs.wordpress.com/2012/01/15/suggested-reading-the-practical-pyromaniac/</link>
		<comments>http://hbfs.wordpress.com/2012/01/15/suggested-reading-the-practical-pyromaniac/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 04:23:00 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[Suggested Reading]]></category>
		<category><![CDATA[flame-thrower]]></category>
		<category><![CDATA[mischief]]></category>
		<category><![CDATA[pyromaniac]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3903</guid>
		<description><![CDATA[William Gurstelle &#8212;&#160;The Practical Pyromaniac: Build Fire Tornadoes, One-Candlepower Engines, Great Balls of Fire, and More Incendiary Devices&#160;&#8212; Chicago Review Press, 2011, 212 pp. ISBN&#160;978-1-56976-710-8 This book is quite the step up from Mini Weapons Of Mass Destruction 2: not only the projects presented are a lot more interesting&#8212;flame-thrower and all&#8212;they are presented in their [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3903&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>William Gurstelle &mdash;&nbsp;<i><a href="http://www.amazon.com/gp/product/1569767106/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1569767106" target="_blank">The Practical Pyromaniac</a>: Build Fire Tornadoes, One-Candlepower Engines, Great Balls of Fire, and More Incendiary Devices</i>&nbsp;&mdash; Chicago Review Press, 2011, 212 pp. ISBN&nbsp;978-1-56976-710-8</span></p>
<div id="attachment_3905" class="wp-caption aligncenter" style="width: 150px"><a href="http://www.amazon.com/gp/product/1569767106/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1569767106"><img src="http://hbfs.files.wordpress.com/2012/01/practical-pyro.jpg?w=450" alt="" title="practical-pyro"   class="size-full wp-image-3905" /></a><p class="wp-caption-text">(Buy At Amazon)</p></div>
<p>This book is quite the step up from <a href="http://hbfs.wordpress.com/2011/12/30/suggested-reading-mini-weapons-of-mass-destruction-2/" target="_blank">Mini Weapons Of Mass Destruction 2</a>: not only the projects presented are a lot more interesting&mdash;flame-thrower and all&mdash;they are presented in their historical and scientific contexts, explaining the active principles of each device/contraption. While <i>Mini Weapons</i> was something of a kid&#8217;s book, <a href="http://www.amazon.com/gp/product/1569767106/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1569767106" target="_blank"><i>The Practical Pyromaniac</i></a> certainly <em>isn&#8217;t</em>. I must say it will appeal quite a lot to every one with a little <em>envie de destruction</em>.</p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/suggested-reading/'>Suggested Reading</a> Tagged: <a href='http://hbfs.wordpress.com/tag/flame-thrower/'>flame-thrower</a>, <a href='http://hbfs.wordpress.com/tag/mischief/'>mischief</a>, <a href='http://hbfs.wordpress.com/tag/pyromaniac/'>pyromaniac</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3903/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3903/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3903/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3903&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/15/suggested-reading-the-practical-pyromaniac/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2012/01/practical-pyro.jpg" medium="image">
			<media:title type="html">practical-pyro</media:title>
		</media:content>
	</item>
		<item>
		<title>Medians (Part II)</title>
		<link>http://hbfs.wordpress.com/2012/01/10/medians-part-ii/</link>
		<comments>http://hbfs.wordpress.com/2012/01/10/medians-part-ii/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 15:42:46 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[data structures]]></category>
		<category><![CDATA[C-plus-plus]]></category>
		<category><![CDATA[median]]></category>
		<category><![CDATA[Sorting Networks]]></category>
		<category><![CDATA[ADL]]></category>
		<category><![CDATA[cmpxchg]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3815</guid>
		<description><![CDATA[In the previous post of this series, we left off where we were asking ourselves if there was a better way than the selection algorithm of finding the median. Computing the median of three numbers is a simple as sorting the three numbers (an operation that can be done in constant time, after all, if [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3815&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://hbfs.wordpress.com/2011/12/27/medians-part-i/" target="_blank">previous post of this series</a>, we left off where we were asking ourselves if there was a better way than the <tt>selection</tt> algorithm of finding the median.</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg"><img src="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg?w=450" alt="" title="split-rock-small"   class="aligncenter size-full wp-image-3812" /></a></p>
<p>Computing the median of three numbers is a simple as sorting the three numbers (an operation that can be done in constant time, after all, if comparing and swapping are constant time) and picking the middle. However, if the objects compared are &#8220;heavy&#8221;, comparing and (especially) moving them around may be expensive.</p>
<p><span id="more-3815"></span></p>
<p>One possibility, is to use comparison, but <em>no</em> swapping, to find the median. Basically, for three numbers <img src='http://s0.wp.com/latex.php?latex=a&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a' title='a' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b' title='b' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=c&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='c' title='c' class='latex' />, we want to know which one of the six following arrangements</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=a+%5Cleqslant+b+%5Cleqslant+c&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a &#92;leqslant b &#92;leqslant c' title='a &#92;leqslant b &#92;leqslant c' class='latex' />,</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=a+%5Cleqslant+c+%5Cleqslant+b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a &#92;leqslant c &#92;leqslant b' title='a &#92;leqslant c &#92;leqslant b' class='latex' />,</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=b+%5Cleqslant+a+%5Cleqslant+c&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b &#92;leqslant a &#92;leqslant c' title='b &#92;leqslant a &#92;leqslant c' class='latex' />,</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=b+%5Cleqslant+c+%5Cleqslant+a&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b &#92;leqslant c &#92;leqslant a' title='b &#92;leqslant c &#92;leqslant a' class='latex' />,</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=c+%5Cleqslant+a+%5Cleqslant+b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='c &#92;leqslant a &#92;leqslant b' title='c &#92;leqslant a &#92;leqslant b' class='latex' />,</p>
<p>or</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=c+%5Cleqslant+b+%5Cleqslant+a&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='c &#92;leqslant b &#92;leqslant a' title='c &#92;leqslant b &#92;leqslant a' class='latex' /></p>
<p>holds.</p>
<p>The goal is therefore to devise a testing procedure that determines, in an optimal number of steps, which one of the above six arrangements corresponds to the actual values of <img src='http://s0.wp.com/latex.php?latex=a&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a' title='a' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b' title='b' class='latex' />, and <img src='http://s0.wp.com/latex.php?latex=c&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='c' title='c' class='latex' />. We can proceed by elimination. Testing any two pairs, say <img src='http://s0.wp.com/latex.php?latex=a%5Cleqslant+b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a&#92;leqslant b' title='a&#92;leqslant b' class='latex' />, will split the above six arrangements in two groups; one in which <img src='http://s0.wp.com/latex.php?latex=a%5Cleqslant+b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a&#92;leqslant b' title='a&#92;leqslant b' class='latex' /> holds (with three arrangements) and one in which it does not (also with three arrangements). Each group can be further divided by asking another question, say, <img src='http://s0.wp.com/latex.php?latex=b%5Cleqslant+c&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b&#92;leqslant c' title='b&#92;leqslant c' class='latex' />, which also creates two sub-groups, and so on, until all groups have been broken down to exactly one arrangement. In C++, that would look pretty much like:</p>
<p><pre class="brush: cpp;">
template &lt;typename T&gt;
const T &amp; median3( const T &amp; a,
                   const T &amp; b,
                   const T &amp; c )
 {
   if (a&lt;b) // {a,b,c}, {a,c,b}, {c,a,b}
     if (b&lt;c)
      return b; // {a,b,c}
     else
      // {a,c,b}, {c,a,b}
      if (a&lt;c)
       return c; // {a,c,b}
      else
       return a; // {c,a,b}
    else
     // {b,a,c}, {b,c,a}, {c,b,a}
     if (b&lt;c)  // {b,a,c}, {b,c,a}
      if (a&lt;c)
       return a; // {b,a,c}
      else
       return c; // {b,c,a}
     else
      return b; // {c,b,a}
 }
</pre></p>
<p>You understood that the method works (it will ask essentially <img src='http://s0.wp.com/latex.php?latex=O%28%5Clg+n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(&#92;lg n)' title='O(&#92;lg n)' class='latex' /> questions for <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> numbers, and it does not move object around), but also that the resulting if-tree grows rapidly in the number of items to sort. That is, good luck if you have to find the median of 25 numbers.</p>
<p>Moreover, this method is branch-intensive: it will branch randomly, and will be difficult to predict by the <a href="http://en.wikipedia.org/wiki/Branch_predictor" target="_blank">branch predictor</a>, therefore (probably) harming performance quite a bit.</p>
<p>If we allow swaps (and consider them inexpensive), we can do something using <a href="http://en.wikipedia.org/wiki/Sorting_network" target="_blank">sorting networks</a>.</p>
<p>A <em>sorting network</em> can be understood as a train-yard sorting station metaphor. You have <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> lanes, and along the lanes, you have a certain number of exchanging junctions onto other lanes. That is, if a train on a lane <img src='http://s0.wp.com/latex.php?latex=i&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='i' title='i' class='latex' /> switches to lane <img src='http://s0.wp.com/latex.php?latex=j&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='j' title='j' class='latex' />, then the train on lane <img src='http://s0.wp.com/latex.php?latex=j&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='j' title='j' class='latex' /> switches to lane <img src='http://s0.wp.com/latex.php?latex=i&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='i' title='i' class='latex' />.</p>
<p>The goal is to figure out the most efficient network (with the smallest number of switches) for a given number of lanes. Let us see what sorting networks look like.</p>
<p>For <img src='http://s0.wp.com/latex.php?latex=n%3D1&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=1' title='n=1' class='latex' /> or <img src='http://s0.wp.com/latex.php?latex=n%3D2&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=2' title='n=2' class='latex' />, there isn&#8217;t much to do: with <img src='http://s0.wp.com/latex.php?latex=n%3D2&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=2' title='n=2' class='latex' />, you just compare the two values and swap them if one is larger than the other (that is, <img src='http://s0.wp.com/latex.php?latex=a%3Eb&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='a&gt;b' title='a&gt;b' class='latex' />, if you want increasing order). In C++, the basic operation is therefore given by:</p>
<p><pre class="brush: cpp;">
template &lt;typename T&gt;
void cmpexchg(T &amp; a, T &amp; b)
 {
  using std::swap; // ADL trick
  if (a&gt;b) swap(a,b);
 }
</pre></p>
<p>and the graph representing the network is denoted as:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/network-2.png"><img src="http://hbfs.files.wordpress.com/2011/12/network-2.png?w=450" alt="" title="network-2"   class="aligncenter size-full wp-image-3819" /></a></p>
<p>Where lanes are joined by lines, and circle/dots indicate that the lanes are connected through the line (this is somewhat of an electrical engineering notation, but the large dots, although redundant, make it easier to understand the graph). At <img src='http://s0.wp.com/latex.php?latex=n%3D3&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=3' title='n=3' class='latex' />, we have a network such as:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/network-3.png"><img src="http://hbfs.files.wordpress.com/2011/12/network-3.png?w=450" alt="" title="network-3"   class="aligncenter size-full wp-image-3820" /></a></p>
<p>(where the color lines locate &#8216;sequence points&#8217; in the network which determine which compares can be done in parallel) which would yield the code:</p>
<p><pre class="brush: cpp;">
template &lt;typename T&gt;
T median3_network( T a,
                   T b,
                   T c )
 {
  cmpexchg(a,c);
  cmpexchg(a,b);
  cmpexchg(b,c);
  return b;
 }
</pre></p>
<p>One of the possible networks for <img src='http://s0.wp.com/latex.php?latex=n%3D4&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=4' title='n=4' class='latex' /> is given by:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/network-4.png"><img src="http://hbfs.files.wordpress.com/2011/12/network-4.png?w=450" alt="" title="network-4"   class="aligncenter size-full wp-image-3821" /></a></p>
<p>and finally, for <img src='http://s0.wp.com/latex.php?latex=n%3D5&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=5' title='n=5' class='latex' />, one of the possible solutions looks like:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/network-5.png"><img src="http://hbfs.files.wordpress.com/2011/12/network-5.png?w=450" alt="" title="network-5"   class="aligncenter size-full wp-image-3822" /></a></p>
<p>&#8230;which is already rather more complex than with <img src='http://s0.wp.com/latex.php?latex=n%3D4&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=4' title='n=4' class='latex' />.</p>
<p>The advantage with a sorting network is that, in theory, a great number of comparison/exchange can be made in parallel (as they are mutually independent) and thus collapse the depth of the network significantly. Assuming that we can do up to <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> comparison in parallel, the depth of such a network is <img src='http://s0.wp.com/latex.php?latex=O%28%28%5Clg+n%29%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O((&#92;lg n)^2)' title='O((&#92;lg n)^2)' class='latex' /> (but with size <img src='http://s0.wp.com/latex.php?latex=O%28n+%28%5Clg+n%29%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n (&#92;lg n)^2)' title='O(n (&#92;lg n)^2)' class='latex' />).</p>
<p>The disadvantage is that, in general, it is <em>hard</em> (in the <a href="http://en.wikipedia.org/wiki/Co-NP" target="_blank">co-NP</a>-complete sense) to determine whether or not a given network is optimal for the size of the problem (or, conversely, it is <a href="http://en.wikipedia.org/wiki/NP-complete" target="_blank">NP-Complete</a> to arrive at an optimal network)!</p>
<p align="center">*<br />*&emsp;*</p>
<p>So sorting networks are great if you can have parallel compare/exchange operations, but when we examine the primitive <tt>cmpexchg(T &amp; a, T &amp; b)</tt>, we see they&#8217;re not fundamentally more efficient than the if-tree based solution. If one thing, they&#8217;re much worst if the cost of swapping is high.</p>
<p align="center">*<br />*&emsp;*</p>
<p>Can we use a primitive such as <tt>median3</tt> to get the median of a much larger number of values than just three? What is a <tt>median3</tt> of three <tt>median3</tt>?</p>
<p><em>To Be Continued&#8230;</em></p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/algorithms/'>algorithms</a>, <a href='http://hbfs.wordpress.com/category/c/'>C</a>, <a href='http://hbfs.wordpress.com/category/c-plus-plus/'>C-plus-plus</a>, <a href='http://hbfs.wordpress.com/category/data-structures/'>data structures</a>, <a href='http://hbfs.wordpress.com/category/programming/'>programming</a> Tagged: <a href='http://hbfs.wordpress.com/tag/adl/'>ADL</a>, <a href='http://hbfs.wordpress.com/tag/cmpxchg/'>cmpxchg</a>, <a href='http://hbfs.wordpress.com/tag/median/'>median</a>, <a href='http://hbfs.wordpress.com/tag/sorting-networks/'>Sorting Networks</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3815/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3815/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3815/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3815&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/10/medians-part-ii/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/split-rock-small.jpg" medium="image">
			<media:title type="html">split-rock-small</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/network-2.png" medium="image">
			<media:title type="html">network-2</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/network-3.png" medium="image">
			<media:title type="html">network-3</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/network-4.png" medium="image">
			<media:title type="html">network-4</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/network-5.png" medium="image">
			<media:title type="html">network-5</media:title>
		</media:content>
	</item>
		<item>
		<title>Building a Balanced Tree From a List in Linear Time</title>
		<link>http://hbfs.wordpress.com/2012/01/03/building-a-balanced-tree-from-a-list-in-linear-time/</link>
		<comments>http://hbfs.wordpress.com/2012/01/03/building-a-balanced-tree-from-a-list-in-linear-time/#comments</comments>
		<pubDate>Tue, 03 Jan 2012 15:41:00 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[data compression]]></category>
		<category><![CDATA[data structures]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Compact Tree]]></category>
		<category><![CDATA[Compact Tree Storage]]></category>
		<category><![CDATA[Huffman]]></category>
		<category><![CDATA[Huffman Codes]]></category>
		<category><![CDATA[phase-in codes]]></category>
		<category><![CDATA[Search Tree]]></category>
		<category><![CDATA[Segregated Storage]]></category>
		<category><![CDATA[Tree]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3846</guid>
		<description><![CDATA[The usual way of forming a search tree from a list is to scan the list and insert each of its element, one by one, into the tree, leading to a(n expected) run-time of . However, if the list is sorted (in ascending order, say) and the tree is not one of the self-balancing varieties, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3846&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The usual way of forming a <a href="http://en.wikipedia.org/wiki/Search_tree" target="_blank">search tree</a> from a <a href="http://en.wikipedia.org/wiki/List_%28abstract_data_type%29" target="_blank">list</a> is to scan the list and insert each of its element, one by one, into the tree, leading to a(n expected) run-time of <img src='http://s0.wp.com/latex.php?latex=O%28n+%5Clg+n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n &#92;lg n)' title='O(n &#92;lg n)' class='latex' />.</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/img_6725-small.jpg"><img src="http://hbfs.files.wordpress.com/2011/12/img_6725-small.jpg?w=200&#038;h=132" alt="" title="IMG_6725-small" width="200" height="132" class="aligncenter size-thumbnail wp-image-3851" /></a></p>
<p>However, if the list is sorted (in ascending order, say) and the tree is not one of the <a href="http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree" target="_blank">self-balancing varieties</a>, insertion is <img src='http://s0.wp.com/latex.php?latex=O%28n%5E2%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n^2)' title='O(n^2)' class='latex' />, because the &#8220;tree&#8221; created by the successive insertions of sorted key is in fact a degenerate tree, a list. So, what if the list is already sorted and don&#8217;t really want to have a self-balancing tree? Well, it turns out that you can build a(n almost perfectly) balanced tree in <img src='http://s0.wp.com/latex.php?latex=O%28n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n)' title='O(n)' class='latex' />.</p>
<p><span id="more-3846"></span></p>
<p>Let us make the simplifying assumption that we can have two types of nodes in the tree: leaves, that contains the actual data, and internal nodes (or just nodes for the remainder of this post) that holds only a key.</p>
<p>The first strategy that comes to mind, using this assumption, is to use a method reminiscent of how <a href="http://hbfs.wordpress.com/2011/05/17/huffman-codes/" target="_blank">Huffman Codes</a> are constructed. That is, we scan the original list from left to right: we take two nodes (if there are at least two left), remove them from the list, and insert back in their place a new node containing the key of the second list item (remember, the list is supposed to be sorted in ascending order), with the two original nodes (or leaves) as children. That is, the operation goes something like this:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram-merge.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram-merge.png?w=150&#038;h=53" alt="" title="tree-diagram-merge" width="150" height="53" class="aligncenter size-thumbnail wp-image-3852" /></a></p>
<p>(Notice the metaphor: leaves are green, internal nodes brown.) We simply re-scan the list until we have a single remaining node, which will be the root of the tree. The procedure is simple enough, indeed. Let us work out a full example with 5 leaves:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png?w=150&#038;h=21" alt="" title="tree-diagram1" width="150" height="21" class="aligncenter size-thumbnail wp-image-3853" /></a></p>
<p>Then we do one pass of merges:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram2.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram2.png?w=150&#038;h=53" alt="" title="tree-diagram2" width="150" height="53" class="aligncenter size-thumbnail wp-image-3854" /></a></p>
<p>Then another:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram3.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram3.png?w=150&#038;h=85" alt="" title="tree-diagram3" width="150" height="85" class="aligncenter size-thumbnail wp-image-3855" /></a></p>
<p>&#8230;and finally:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram4.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram4.png?w=150&#038;h=109" alt="" title="tree-diagram4" width="150" height="109" class="aligncenter size-thumbnail wp-image-3856" /></a></p>
<p>What do you notice? The three isn&#8217;t all that balanced. In fact, it&#8217;s <em>really</em> unbalanced. What went wrong? Well, if for Huffman Codes, this algorithm is optimal, for search trees, it is clearly not. For one thing, Huffman&#8217;s method wants to build trees with average path-lengths as close as possible to the source&#8217;s entropy; here our goal is to have path-lengths as equal as possible.</p>
<p>Fortunately, we can &#8220;repair&#8221; the algorithm quite easily, and it&#8217;s another code that will come to our aid: <a href="http://hbfs.wordpress.com/2008/09/02/phase-in-codes/" target="_blank">Phase-In Codes</a>, which have average code lengths close to <img src='http://s0.wp.com/latex.php?latex=%5Clg+n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lg n' title='&#92;lg n' class='latex' /> (as I show <a href="http://hbfs.wordpress.com/2008/09/23/length-of-phase-in-codes/" target="_blank">here</a>).</p>
<p>The key observation is that if the length of the list <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> is a power of two, then the list can be successively merged together by the above algorithm and yield a tree that has <em>exactly</em> <img src='http://s0.wp.com/latex.php?latex=%5Clg+n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lg n' title='&#92;lg n' class='latex' /> depth. Therefore, if we somehow modify the list to have exactly <img src='http://s0.wp.com/latex.php?latex=2%5Ek&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2^k' title='2^k' class='latex' /> items in it, we&#8217;ll be able to have our optimal <img src='http://s0.wp.com/latex.php?latex=O%28%5Clg+n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(&#92;lg n)' title='O(&#92;lg n)' class='latex' /> depth. Phase-In Codes provides the mean to do so. If</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=n+%3D+2%5Ek%2Bb&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n = 2^k+b' title='n = 2^k+b' class='latex' /></p>
<p>with the largest <img src='http://s0.wp.com/latex.php?latex=k%5Cin%5Cmathbb%7BN%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='k&#92;in&#92;mathbb{N}' title='k&#92;in&#92;mathbb{N}' class='latex' /> and with <img src='http://s0.wp.com/latex.php?latex=b%5Cin%5Cmathbb%7BZ%5E%2A%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b&#92;in&#92;mathbb{Z^*}' title='b&#92;in&#92;mathbb{Z^*}' class='latex' /> (thus imposing the uniqueness of the solution, with <img src='http://s0.wp.com/latex.php?latex=0%5Cleqslant%7Bb%7D%3C%7B2%5Ek%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='0&#92;leqslant{b}&lt;{2^k}' title='0&#92;leqslant{b}&lt;{2^k}' class='latex' />), we can merge the first <img src='http://s0.wp.com/latex.php?latex=2b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2b' title='2b' class='latex' /> nodes, and it will result in a list of <img src='http://s0.wp.com/latex.php?latex=2%5Ek&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2^k' title='2^k' class='latex' /> nodes (<img src='http://s0.wp.com/latex.php?latex=b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b' title='b' class='latex' /> of which are now internal nodes). Once we have merged the first <img src='http://s0.wp.com/latex.php?latex=2b&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2b' title='2b' class='latex' /> nodes, we stop the iteration there and restart from the beginning of the list, which is now of length <img src='http://s0.wp.com/latex.php?latex=2%5Ek&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2^k' title='2^k' class='latex' />, and will yield the most-equal depth tree we wanted.</p>
<p>Again, with <img src='http://s0.wp.com/latex.php?latex=n%3D5&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=5' title='n=5' class='latex' />: <img src='http://s0.wp.com/latex.php?latex=n%3D2%5Ek%2Bb%3D2%5E2%2B1&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n=2^k+b=2^2+1' title='n=2^k+b=2^2+1' class='latex' />, so <img src='http://s0.wp.com/latex.php?latex=b%3D1&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='b=1' title='b=1' class='latex' />, we take the two first nodes</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png?w=150&#038;h=21" alt="" title="tree-diagram1" width="150" height="21" class="aligncenter size-thumbnail wp-image-3853" /></a></p>
<p>and merge them:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram5.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram5.png?w=150&#038;h=53" alt="" title="tree-diagram5" width="150" height="53" class="aligncenter size-thumbnail wp-image-3860" /></a></p>
<p>and <em>voilà</em>, we proceed with the successive merges:</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram6.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram6.png?w=150&#038;h=85" alt="" title="tree-diagram6" width="150" height="85" class="aligncenter size-thumbnail wp-image-3861" /></a></p>
<p>and then</p>
<p><a href="http://hbfs.files.wordpress.com/2011/12/tree-diagram7.png"><img src="http://hbfs.files.wordpress.com/2011/12/tree-diagram7.png?w=150&#038;h=109" alt="" title="tree-diagram7" width="150" height="109" class="aligncenter size-thumbnail wp-image-3862" /></a></p>
<p>and we have a tree of almost equal depth, with average depth of <img src='http://s0.wp.com/latex.php?latex=%5E%7B12%7D%2F_%7B5%7D%3D2.4&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='^{12}/_{5}=2.4' title='^{12}/_{5}=2.4' class='latex' /> (while the actual <img src='http://s0.wp.com/latex.php?latex=%5Clg+5+%5Capprox+2.32&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lg 5 &#92;approx 2.32' title='&#92;lg 5 &#92;approx 2.32' class='latex' />) rather than <img src='http://s0.wp.com/latex.php?latex=%5E%7B13%7D%2F_%7B5%7D+%5Capprox+2.6&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='^{13}/_{5} &#92;approx 2.6' title='^{13}/_{5} &#92;approx 2.6' class='latex' />, a considerable improvement, even for this trivial tree.</p>
<p align="center">*<br />*&emsp;*</p>
<p>While it is not the customary way of representing a tree in memory, the segregation of leaves and internal nodes may be justified in the context where the leaves and the tree itself live in different memory locations. For example, if the leaves are rather large compared to just the key, they may have to reside in external memory&mdash;that is, the disk&mdash;while you will still like to keep the index, the tree structure itself, in memory.</p>
<p>One could also argue that this scheme is cache-friendlier as one can use some kind of <a href="http://hbfs.wordpress.com/2009/04/07/compact-tree-storage/" target="_blank">compact tree storage</a> to maintain the keys and the structure of the tree in a small amount of memory, and with keys that are nearby in the tree being laid-out nearby in memory, thus reducing <a href="http://en.wikipedia.org/wiki/CPU_cache#Cache_miss" target="_blank">cache misses</a> quite a lot.</p>
<p>In all cases, we still have reduced the complexity of building the initial tree from <img src='http://s0.wp.com/latex.php?latex=O%28n+%5Clg+n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n &#92;lg n)' title='O(n &#92;lg n)' class='latex' /> to essentially <img src='http://s0.wp.com/latex.php?latex=O%28n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='O(n)' title='O(n)' class='latex' />,</p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/algorithms/'>algorithms</a>, <a href='http://hbfs.wordpress.com/category/data-compression/'>data compression</a>, <a href='http://hbfs.wordpress.com/category/data-structures/'>data structures</a>, <a href='http://hbfs.wordpress.com/category/programming/'>programming</a> Tagged: <a href='http://hbfs.wordpress.com/tag/compact-tree/'>Compact Tree</a>, <a href='http://hbfs.wordpress.com/tag/compact-tree-storage/'>Compact Tree Storage</a>, <a href='http://hbfs.wordpress.com/tag/huffman/'>Huffman</a>, <a href='http://hbfs.wordpress.com/tag/huffman-codes/'>Huffman Codes</a>, <a href='http://hbfs.wordpress.com/tag/phase-in-codes/'>phase-in codes</a>, <a href='http://hbfs.wordpress.com/tag/search-tree/'>Search Tree</a>, <a href='http://hbfs.wordpress.com/tag/segregated-storage/'>Segregated Storage</a>, <a href='http://hbfs.wordpress.com/tag/tree/'>Tree</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3846/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3846/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3846/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3846&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2012/01/03/building-a-balanced-tree-from-a-list-in-linear-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/img_6725-small.jpg?w=150" medium="image">
			<media:title type="html">IMG_6725-small</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram-merge.png?w=150" medium="image">
			<media:title type="html">tree-diagram-merge</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png?w=150" medium="image">
			<media:title type="html">tree-diagram1</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram2.png?w=150" medium="image">
			<media:title type="html">tree-diagram2</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram3.png?w=150" medium="image">
			<media:title type="html">tree-diagram3</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram4.png?w=150" medium="image">
			<media:title type="html">tree-diagram4</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram1.png?w=150" medium="image">
			<media:title type="html">tree-diagram1</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram5.png?w=150" medium="image">
			<media:title type="html">tree-diagram5</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram6.png?w=150" medium="image">
			<media:title type="html">tree-diagram6</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/tree-diagram7.png?w=150" medium="image">
			<media:title type="html">tree-diagram7</media:title>
		</media:content>
	</item>
		<item>
		<title>Suggested Reading: Mini Weapons of Mass Destruction 2</title>
		<link>http://hbfs.wordpress.com/2011/12/30/suggested-reading-mini-weapons-of-mass-destruction-2/</link>
		<comments>http://hbfs.wordpress.com/2011/12/30/suggested-reading-mini-weapons-of-mass-destruction-2/#comments</comments>
		<pubDate>Sat, 31 Dec 2011 01:36:38 +0000</pubDate>
		<dc:creator>Steven Pigeon</dc:creator>
				<category><![CDATA[Life in the workplace]]></category>
		<category><![CDATA[Suggested Reading]]></category>

		<guid isPermaLink="false">http://hbfs.wordpress.com/?p=3838</guid>
		<description><![CDATA[John Austin &#8212;&#160;Mini Weapons of Mass Destruction 2 &#8212; Build A Secret Agent Arsenal&#160;&#8212; Chicago Review Press 2011, 260 pp. ISBN&#160;978-1-56976-716-0 A quite amusing little books on needlessly complicated hacks, but that can bring quite the ruckus in the office/school. Q-Tip launchers, (paper) ninja stars, rubberband weapons, CD periscopes, &#8230; all built from readily available [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3838&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>John Austin &mdash;&nbsp;<a href="http://www.amazon.com/gp/product/1569767165?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=xm2&amp;camp=1789&amp;creativeASIN=1569767165" target="_blank"><i>Mini Weapons of Mass Destruction 2 &mdash;<br />
Build A Secret Agent Arsenal</i></a>&nbsp;&mdash; Chicago Review Press<br />
2011, 260 pp. ISBN&nbsp;978-1-56976-716-0</span></p>
<div id="attachment_3839" class="wp-caption aligncenter" style="width: 150px"><a href="http://www.amazon.com/gp/product/1569767165?ie=UTF8&amp;tag=hardbettfasts-20&amp;linkCode=xm2&amp;camp=1789&amp;creativeASIN=1569767165"><img src="http://hbfs.files.wordpress.com/2011/12/mini-weapons-2.jpg?w=450" alt="" title="mini-weapons-2"   class="size-full wp-image-3839" /></a><p class="wp-caption-text">(Buy at Amazon.com)</p></div>
<p>A quite amusing little books on needlessly complicated hacks, but that can bring quite the ruckus in the office/school. Q-Tip launchers, (paper) ninja stars, rubberband weapons, CD periscopes, &#8230; all built from readily available office supplies. In fact, they are all <em>way</em> too complicated, but sooo much fun.</p>
<br />Filed under: <a href='http://hbfs.wordpress.com/category/life-in-the-workplace/'>Life in the workplace</a>, <a href='http://hbfs.wordpress.com/category/suggested-reading/'>Suggested Reading</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hbfs.wordpress.com/3838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hbfs.wordpress.com/3838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hbfs.wordpress.com/3838/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hbfs.wordpress.com&amp;blog=4426521&amp;post=3838&amp;subd=hbfs&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hbfs.wordpress.com/2011/12/30/suggested-reading-mini-weapons-of-mass-destruction-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/56c87cdd18d9a97255c0343c80fba38d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">stevenpigeon</media:title>
		</media:content>

		<media:content url="http://hbfs.files.wordpress.com/2011/12/mini-weapons-2.jpg" medium="image">
			<media:title type="html">mini-weapons-2</media:title>
		</media:content>
	</item>
	</channel>
</rss>
