Suggested Reading: Google’s PageRank and Beyond: The Science of Search Engine Rankings

24/01/2009

Amy N. Langville, Carl D. Meyer —Google’s PageRank and Beyond: The Science of Search Engine Rankings— Princeton University Press, 2006, 224 pp. ISBN 0-691-12202-4.

Page Rank And Beyond

(Buy at Amazon.com)

Langville and Meyer presents us the mathematics behind Page Rank, Google’s method of rank prioritization as well as other similar methods. If the first few chapters are concerned with the history of search engines and search methods, the remainder of the text is clearly mathematical in nature.

Read the rest of this entry »


Throwing in More Hardware?

20/01/2009

In a recent blog entry, Jeff Atwood makes the case that “hardware is Cheap, Programmers are Expensive”, basically arguing that in a lot of cases, it’s cost-efficient to add cheap, faster, hardware to a performance problem rather than investing time to actually enhance the code at the algorithmic and implementation level.

I do, however, disagree with him on many points.

Read the rest of this entry »


The Zunes Freezes: Update!

05/01/2009

The faulty code was leaked some time last week, and I’ve looked at it, and, well, it’s surprisingly clean (despite the obvious defect and how they should’ve found it before releasing it (see here)).

Have a look at the ConvertDays routine.

Read the rest of this entry »


The Zune Freezes: A Stupid, Avoidable Bug.

31/12/2008

The few Zune users where alienated today when they discovered that their Zunes just froze during boot. Apparently, a mysterious bug.

The cause of the bug was rapidly diagnosticated. According to Itsnotabigtruck on Zuneboards, the bug can be traced back to a defective date conversion routine. I reproduce here the code for you:

year = ORIGINYEAR; /* = 1980 */

while (days > 365)
 {
  if (IsLeapYear(year))
   {
    if (days > 366)
        {
         days -= 366;
         year += 1;
       }
   }
  else
   {
    days -= 365;
    year += 1;
   }
 }

Can you see what’s just wrong about that code?

Read the rest of this entry »


The 10 (classes of) Algorithms Every Programmer Must Know About

23/12/2008

In Tunnels of Doom!, I wrote that the disjoint sets algorithm is one of the very few algorithms every programmer should know. That got me thinking. Should? What about must? If everyone must know about disjoint sets, what other algorithms must every programmer know about?

I made a “top ten” list of algorithms and data structures every programmer must know about.

Read the rest of this entry »


Tunnels of Doom!

16/12/2008

Æons ago, that is, in September 1982, I got my first true computer, the Texas Instrument TI-99/4A. A truly incredible machine, especially considering the era. Its CPU, a true 16 bits processor, the TMS 9900, ran at 3MHz. Even the popular TRS-80 Color Computer, Radio Shack’s 6809-based computer, ran at a mere 0.895MHz, or, if the gods were good to you, you could tweak the clock divisor and crank it up to 1.79MHz.

In addition to being my first programmable computer, the TI-99/4A sported truly nice graphics and a relatively advanced sound chip—for the era—and a number of games were developed for it. Hunt the Wumpus, Parsec, and others. But the one game that really got me interested in programming is Kevin Kenney’s Tunnels of Doom, one of the very first graphic dungeon crawler games. While not being nostalgic by nature, I returned to that game about a year ago following an unusual route. I was looking into procedural content generation, especially map generation, in relation to data compression and I remembered that the old game’s dungeons were generated randomly, yet were very well structured. I decided to get a look into it, and see if I could generate maps like his.

Read the rest of this entry »


Deriving the 1 bit = 6 dB rule of thumb

09/12/2008

This week, a more mathematical topic. Sometime ago, we—friends and I—were discussing the fidelity of various signals, and how many bits were needed for an optimal digitization of the signal, given known characteristics such as spectrum and signal-to-noise ratio.

Indeed, at some point, when adding bits, you only add more power to represent noise in the signal. There’s a rule of thumb that say that for every bit you add, you can represent a signal with \approx 6 dB more of signal to noise ratio. Let me show you how you derive such a result.

Read the rest of this entry »


Safer Integer Constants (Part II)

02/12/2008

In C and C++, the behavior of integer promotions is not the same whether we use signed integers or unsigned integers. In bit twiddling hacks, it is not immediately apparent that it is a problem that may lead to unexpected result whenever code is ported to a different architecture. This is a recurring topic whenever I discuss portability with friends. Very often, they argue that if it works on Windows and 32 bits x86 Linux, that’s pretty much all there is to it.

Of course, I couldn’t disagree more. I have learned the hard way.

Read the rest of this entry »


Serialization, Binary Encodings, and Sync Markers

07/10/2008

Serialization, the process by which run-time objects are saved from memory to a persistent storage (typically disk) or sent across the network, necessitate the objects to be encoded in some efficient, preferably machine-independent, encoding.

One could consider XML or JSON, which are both viable options whenever simplicity or human-readability is required, or if every serialized object has a natural text-only representation. JSON, for example, provides only for a limited number of basic data types: number, string, bool, arrays and objects. What if you have a binary object? The standard approach with text encodings is to use Base64, but this results in an 33% data expansion, since every 3 source bytes become 4 encoded bytes. Base64 uses a-z, A-z, 0-9, +, /, and = as encoding symbols, entirely avoiding comas (,), quotes (both single and double), braces, parentheses, newlines, or other symbols likely to interfere with the host encoding, whether XML, JSON, or CSV.

What if you do not want (or cannot afford) the bloatification of your data incurred by XML or JSON, and are not using a language with built-in binary serialization? Evidently, you will roll up your own binary encoding for your data. But to do so, one has to provide not only the serialization mechanisms for the basic data types (including, one would guess, the infamous “binary blob”) but also a higher-level syntax of serialization that provides for structure and—especially—error resilience.

Read the rest of this entry »


UEID: Unique Enough IDs

30/09/2008

Generating unique, unambiguous, IDs for data is something we do often, but we do not always know what level of uniqueness is really needed. In some cases, we want to be really sure that two instances of the same ID identify two copies of the same object or data. In other cases, we only want to be reasonably sure. In other cases, yet, we just assume that collisions—two different objects yielding the same ID—are very unlikely, and, if the need be, we can proceed to further testing to establish equality.

There are many ways of generating IDs, each with varying levels of confidence on uniqueness and differing applications.

Read the rest of this entry »