In a previous post, I presented the phase-in codes. I stated that they were very efficient given the distribution is uniform. I also stated that, given the distribution is uniform, they never do worse than bits worse than the theoretical lower bound of
, where
is the number of distinct symbols to code and
is a convenient shorthand for
. In this post, I derive that result.
Length of Phase-in Codes
23/09/2008Phase-in Codes
02/09/2008Once in a while, we have to encode a series of values to memory or file, of which we know either very little about, distribution-wise, or are very close to being drawn from an uniform random variable: basically, they’re all equally likely to be drawn. This precludes the use of Huffman (or like) codes because the distribution is flat and Huffman code will afford no real compression despite the added complexity.
We do not want to use a fixed number of bits either because the number of possible values might be very far from the next power of two. For example say I have 11 different values. I could encode them on 4 bits each, but I would waste more than half a bit by doing so. Half a bit wasted for every code. Fortunately, there’s an easy, fast encoding that can help us achieve high efficiency: phase-in codes.
Optimizing boolean expressions for speed.
26/08/2008Minimizing boolean expressions is of great pragmatic importance. In hardware, it is used to reduce the number of transistors in microprocessors. In software, it is used to speed up computations. If you have a complex expression you want to minimize and look up a textbook on discrete mathematics, you will usually find a list of boolean laws and identities that you are supposed to use to minimize functions.
Bit blenders: Move bits around quite a bit
12/08/2008Efficient bit manipulation is often found at the lower levels of many applications, let it be data compression, game programming, graphics manipulations, embedded control, or even efficient data representations, including databases bitmaps and indexes.
In some cases, one might rely on the often limited features of one’s favorite language—C++ in my case—to manipulate bits. In C++ (and in C), there are bit-fields that allow you to define how a piece of memory will be chopped up in regions each having a specific number of bits. Bit-fields can bring you only so far as to describe how bits are mapped onto a chunk of memory; they are of little or no help when you need to efficiently move bits around in a somewhat complicated fashion.
Branchless Equivalents of Simple Functions
05/08/2008Modern processors are equipped with sophisticated branch prediction algorithms (the Pentium family, for example, can predict a vast array of patterns of jumps taken/not taken) but if they, for some reason, mispredict the next jump, the performance can take quite a hit. Branching to an unexpected location means flushing the pipelines, prefetching new instructions, etc, leading to a stall that lasts for many tens of cycles. In order to avoid such dreadful stalls, one can use a branchless equivalent, that is, a code transformed to remove the if-then-elses and therefore jump prediction uncertainties.
Let us start by a simple function, the integer abs( ) function. abs, for absolute value, returns… well, the absolute value of its argument. A straightforward implementation of abs( ) in the C programming language could be
inline unsigned int abs(int x)
{
return (x<0) ? -x : x;
}
Which is simple enough but contains a hidden if-then-else. As the argument, x, isn’t all that likely to follow a pattern that the branch prediction unit can detect, the simple function becomes potentially costly as the jump will be mispredicted quite often. How can we remove the if-then-else, then?
Posted by Steven Pigeon