Ad Hoc Compression Methods: Move To Front

03/03/2009

In a previous post, I’ve presented an ad hoc compression method known as digram (digraph, bigram, bigraph) coding that coded pairs of symbols on used codes from the original alphabet yielding a very simple encode/decode and also sufficiently good compression (in the vicinity of 1.5:1 or so). While clearly not on par with state-of-the-art codecs such as Bzip2 or PAQ8P, this simple codec can still buy you extra space, even when using a very slow/simple processor.

This week, I’ll introduce you to another simple algorithm, Move To Front coding.

Read the rest of this entry »


Ad Hoc Compression Methods: Digram Coding

17/02/2009

Ad hoc compression algorithms always sound somewhat flimsy, but they still have their usefulness whenever the data you have to compress exhibits very special properties or is extremely short. Universal compression algorithms need a somewhat lengthy input to adapt sufficiently to yield good compression. Moreover, you may not have the computing power (CPU speed, RAM and program space) to implement full-blown compressors. In this case also, an ad hoc algorithm may be useful.

Digram (bigram, digraph, bigraph) coding is one of those ad hoc algorithms. Digram coding will compress text (or any source, really) by finding pairs of symbols that appear often and assigning them codes corresponding to used symbols, if any.

hieroglyph

Let us see in detail what digram compression is, and how effective it really is.

Read the rest of this entry »


__MODULE_DEBUG_TOOLS__

03/02/2009

Yesterday, at least at the time I am writing this entry, I spoke about the Zune and its time bug that caused so many of the devices to just freeze on boot. It its clear that this bug is the result of poor testing (in addition to a poor implementation of a conversion using loops rather than explicit divisions and moduli) and this week, I’d like to tell you about one of my tricks to debug code.

Read the rest of this entry »


More Bit-twiddling

27/01/2009

This week, two “quickies”: rounding up and down to the next power of two, and converting efficiently a value to exactly 0 or 1.

Read the rest of this entry »


Honni soit le Hongrois

13/01/2009

The original Hungarian notation is due to Charles Simonyi who invented it sometimes while he was working at Xerox Palo Alto Research Center—the Xerox PARC that gave us the mouse and the first graphical user interfaces. The basic principle of Hungarian naming convention is to prefix the variables with one or many particles, encoding alternatively its type or its intend. This lets programmer write prgszNames as a variable name, which is perfectly legible to one well versed in Hungarian; however, but looks mostly like gibberish to just anyone else.

I recently changed my mind about the Hungarian naming convention. I don’t think it’s that stupid anymore.

Read the rest of this entry »


The Zunes Freezes: Update!

05/01/2009

The faulty code was leaked some time last week, and I’ve looked at it, and, well, it’s surprisingly clean (despite the obvious defect and how they should’ve found it before releasing it (see here)).

Have a look at the ConvertDays routine.

Read the rest of this entry »


The Zune Freezes: A Stupid, Avoidable Bug.

31/12/2008

The few Zune users where alienated today when they discovered that their Zunes just froze during boot. Apparently, a mysterious bug.

The cause of the bug was rapidly diagnosticated. According to Itsnotabigtruck on Zuneboards, the bug can be traced back to a defective date conversion routine. I reproduce here the code for you:

year = ORIGINYEAR; /* = 1980 */

while (days > 365)
 {
  if (IsLeapYear(year))
   {
    if (days > 366)
        {
         days -= 366;
         year += 1;
       }
   }
  else
   {
    days -= 365;
    year += 1;
   }
 }

Can you see what’s just wrong about that code?

Read the rest of this entry »


The True Cost of Calls

30/12/2008

The cost of virtual functions is often invoked as a reason to C++’s poor performance compared to other languages, especially C. This is an enduring myth that, like most myths, have always bugged me. C++ myths are propagated by individuals that did not know C++ very well, tried it one weekend in 1996, used a bad compiler, knew nothing about optimization switches, and peremptorily declared C++ as fundamentally broken. Well, I must agree that C++ compilers in the mid-90s weren’t all that hot, but in the last fifteen years, a lot have been done. Compilers are now rather good at generating efficient C++ code.

However, the cost of calls, whether or not they are virtual, is not dominated by the the call itself (getting the address to jump to and jumping) but by everything else surrounding the call, like the stack setup and argument passing. Let us debunk that myth by looking at what types of calls are available in C and C++, how they translate to machine code, and see how faster or slower they are relative to each other.

Read the rest of this entry »


Safer Integer Constants (Part II)

02/12/2008

In C and C++, the behavior of integer promotions is not the same whether we use signed integers or unsigned integers. In bit twiddling hacks, it is not immediately apparent that it is a problem that may lead to unexpected result whenever code is ported to a different architecture. This is a recurring topic whenever I discuss portability with friends. Very often, they argue that if it works on Windows and 32 bits x86 Linux, that’s pretty much all there is to it.

Of course, I couldn’t disagree more. I have learned the hard way.

Read the rest of this entry »


Safer Float Types

25/11/2008

This week, we’ll be following last week’s post, where we looked at type-safe integer constants, with floating point constants, that is, float and double.

Read the rest of this entry »