GCC Built-ins

January 7, 2014

In the discussion of The Speed of GCD, Daniel Lemire remarked that one could use the compiler-specific intrinsic function __builtin_ctz to get a good speed-up on binary GCD. That remark made me look into all the others intrinsics and built-ins offered by GCC.


Let’s have a look to at least a few of them!

Read the rest of this entry »

Ad Hoc Compression Methods: RLE (part II)

April 15, 2009

Yestermorning I presented an ad hoc compression method known as RLE, or run-length encoding and I got quite more reactions than I thought I would. Many suggested doing plenty of other stuff despite the fact that the post was about RLE, not about <insert your favorite compression method>. But something very nice happened: people got interested and many suggestions came forth.

I decided to implement the median-based prediction method as suggested by Dark Shikari here. Let us see the results!

Read the rest of this entry »

Ad Hoc Compression Methods: RLE

April 14, 2009

Suppose you have been sub-contracted by a video game company, say Nintendo, to come up with an low power, low-memory, implementation of a popular video game, say, an installment of the infamous Pokémon series. The envisaged device is very slow (it has a 2MHz processor) but has a 220×176 pixels color screen capable of displaying 32768 different colors. It also has a hardware sprite mode, but sprites are limited to 4 distinct colors. You have to encode the complete database of Pokémons, which are several hundreds, in as little space as possible. Each image vary somewhat in size, but can only have 4 distinct colors from the possible palette of 32768 colors.

A typical Pokémon would look like:

A typical Pokémon image, shown enlarged and in original size.

A typical Pokémon image, shown enlarged and in original size.

Since the artists put a lot of efforts into drawing the critters in 4 colors, and because of the hardware requirements, you cannot use a compression algorithm that modifies or creates more colors than the original image had, therefore excluding JPEG. Moreover, the JPEG codec would be too complex to port to the platform considered. You must therefore use a simple lossless codec.

We will be comparing 5 codecs for the Pokémon image database. We will be looking at a “block” codec, which will serve as a comparison for a non-RLE codec. We will be looking at 5 variations on the theme of run length encoding, or RLE, namely, the All Counts, the Bit Select, and Auto-Trigger RLE. The last variant will help Auto-Trigger RLE by using predictors, transforming the input image into one that is easier to compress using RLE.

Read the rest of this entry »

Compact Tree Storage

April 7, 2009

Implementing data structures in a way that uses efficiently memory should always be on your mind. I do not mean going overboard and micro-optimizing memory allocation right down to the bit. I mean organize data structures in memory so that you can avoid pointers, so that you can use contiguous memory segments, etc. Normally, minimizing storage by avoiding extra pointers when possible will benefit your program in at least two ways.

First, the reduced memory requirement will make your data structure fit in cache more easily. Remember that if pointers are 4 bytes long in 32 bits programming, they are 8 bytes long in 64 bits environments. This yields better run time performance because you maximize your chances of having the data you need in cache.

Second, contiguous memory layouts also allow for efficient scans of data structures. For example, if you have a classical binary tree, implemented using nodes having each two pointers, you will have to use a tree traversal algorithm, possibly recursive, to enumerate the tree’s content. If you don’t really care about the order in which the nodes are visited, what’s quite cumbersome.

It turns out that for special classes of trees, complete trees, there is a contiguous, and quite simple, layout.

Read the rest of this entry »

Your Automatisms Betray You

March 11, 2009

Yesterday someone dropped on the IRC channel where my fellow programmers, computer enthusiasts, and I hang out to get help to find a bug. He uses one of the paste sites (like pastebin.ca, pastebin.com, or rafb.net), pastes his piece of offending code, and so we get a look at the code. Of course, I go over the short program, notice a mistake in the scanf but it took me a full two minutes to notice the loop:

for (c=0; c++; c<n )

That kind of bug always takes a while to find because we don’t read what’s actually written, but what we think is written, unless we pay the utmost attention to the code—what we should be doing anyway, but do not always. Usually, you zero in on that kind of bug rapidly, as you guide your search from the bug’s symptoms which leads you to defect’s approximate location. If you’re like me—write a little, test a lot—you find those bugs right away most of the times. However, even if you zero in rapidly, you still get a coarse-grained location: module, class, function.

Read the rest of this entry »

Ad Hoc Compression Methods: Move To Front

March 3, 2009

In a previous post, I’ve presented an ad hoc compression method known as digram (digraph, bigram, bigraph) coding that coded pairs of symbols on used codes from the original alphabet yielding a very simple encode/decode and also sufficiently good compression (in the vicinity of 1.5:1 or so). While clearly not on par with state-of-the-art codecs such as Bzip2 or PAQ8P, this simple codec can still buy you extra space, even when using a very slow/simple processor.

This week, I’ll introduce you to another simple algorithm, Move To Front coding.

Read the rest of this entry »

Ad Hoc Compression Methods: Digram Coding

February 17, 2009

Ad hoc compression algorithms always sound somewhat flimsy, but they still have their usefulness whenever the data you have to compress exhibits very special properties or is extremely short. Universal compression algorithms need a somewhat lengthy input to adapt sufficiently to yield good compression. Moreover, you may not have the computing power (CPU speed, RAM and program space) to implement full-blown compressors. In this case also, an ad hoc algorithm may be useful.

Digram (bigram, digraph, bigraph) coding is one of those ad hoc algorithms. Digram coding will compress text (or any source, really) by finding pairs of symbols that appear often and assigning them codes corresponding to used symbols, if any.


Let us see in detail what digram compression is, and how effective it really is.

Read the rest of this entry »

Phase-in Codes

September 2, 2008

Once in a while, we have to encode a series of values to memory or file, of which we know either very little about, distribution-wise, or are very close to being drawn from an uniform random variable: basically, they’re all equally likely to be drawn. This precludes the use of Huffman (or like) codes because the distribution is flat and Huffman code will afford no real compression despite the added complexity.

We do not want to use a fixed number of bits either because the number of possible values might be very far from the next power of two. For example say I have 11 different values. I could encode them on 4 bits each, but I would waste more than half a bit by doing so. Half a bit wasted for every code. Fortunately, there’s an easy, fast encoding that can help us achieve high efficiency: phase-in codes.

Read the rest of this entry »


Get every new post delivered to your Inbox.

Join 65 other followers