Much Ado About Nothing

March 7, 2017

A rather long time ago, I wrote a blog entry on branchless equivalents of simple functions such as sex, abs, min, max. The Sing EXtension instruction propagates the sign bit in the upper bits, and is typically used in the promotion of, say, a 16 bits signed value into a 32 bits variable.

But this time, I needed something a bit different: I only wanted the sign-extended part. Could I do much better than last time? Turns out, the compiler has a mind of its own.

Read the rest of this entry »

Slow down, Keep It Cool

December 20, 2011

In a previous post, I discussed how to set the default power policy with Linux (Ubuntu) by detecting the battery/power status: if you’re plugged-in, set it to on-demand, if you’re running from the battery, set it to powersave. This is rather crude, but proved effective.

But CPUs that support SpeedStep (or similar) usually support a rather long list of possible speed settings. For example, my i7 supports about 15 different speeds, and “powersave” selects the slowest of all, 1.60GHz (on my laptop, that would be 800MHz). Maybe we could leave the policy to on-demand, but cap the maximum speed to something a bit lower than maximum?

Read the rest of this entry »


February 1, 2011

There are plenty of web sites and museums dedicated to the computers of yore. While most of them now seems quaint, and delightfully obsolete, there are probably a lot of lessons we could re-learn and apply today, with our modern computers.

If you followed my blog for some time, you know that I am concerned with efficient computation and representation of just about everything, applied to workstation, servers, and embedded systems. I do think that retro-computing (computing using old computers or the techniques of old computer) has a lot to teach us, and not only from an historical perspective.

Read the rest of this entry »

The Perfect Instruction Set

December 28, 2010

The x86 architecture is ageing, but rather than looking for re-invention, it only saw incremental extensions (especially for operating system instructions and SIMD) over the last decade or so. Before getting to the i7 core, we saw a long series of evolutions—not revolutions. It all started with the 8086 (and its somewhat weaker sibling, the 8088), which was first conceived as an evolutionary extension to the 8085, which was itself binary compatible with the 8080. The Intel 8080’s lineage brings us to the 8008, a 8 bits of data, 14 bits of address micro-processor. Fortunately, the 8008 isn’t a double 4004. The successors of the 8086 include (but the list surely isn’t exhaustive) the 80186, the 80286, the 80386, first in the series to include decent memory protection for multitasking, then the long series of 486, various models of Pentium, Core 2 and i7.

So, just like you can trace the human lineage to apes, then to monkeys, and eventually to rodent-like mammals, the x86 has a long history and is still far from being perfect, and its basic weakness, in my opinion, is that it still use the 1974 8080 accumulator-based instruction set architecture. And despite a number of very smart architectural improvements (out of order execution, branch prediction, SIMD), it still suffers from the same problems the 8085 did: the instruction set is not orthogonal, it contains many obsolete complex instructions that aren’t used all that much anymore (such as the BCD helpers), and that everything has to be backwards compatible, meaning that every new generation still is more of the same, only (mostly) faster.

But what would be the perfect instruction set? In [1], the typical instruction set is composed of seven facets (to which I add an eighth):

Read the rest of this entry »

Suggested Readings:Computer Architecture: A Quantitative Approach

October 17, 2010

John L. Hennessy, David A. Patterson — Computer Architecture: A Quantitative Approach — 4th ed., Morgan Kaufmann, , 704 pp. ISBN 0-12-370490-1

(Buy at

Computer Architecture: A Quantitative Approach is probably the most up-to-date and comprehensive introductory text for computer architecture, covering a broad spectrum of topics from micro-instructions to multi-core parallelism. This book is different—from the aging Advanced Computer Architecture: Parallelism, Scalability, Programmability by Kai Hwang (1992, now out of print) for example—in that it takes a quantitative approach, motivating most statements by hard numbers, simulations and benchmarks.

Read the rest of this entry »

Bundling Memory Accesses (Part I)

January 19, 2010

There’s always a question whether having “more bits” in a CPU will help. Is 64 bits better than 16? If so, how? Is it only that you have bigger integers to count further? Or maybe more accessible memory? Well, quite obviously, being able to address a larger memory or performing arithmetic on larger number is quite useful because, well, 640KB isn’t all that much, and counting on 16 bits doesn’t get your that far.

AMD Phenom

But there are other advantages to using the widest registers available for computation. Often, algorithms that scan the memory using only small chunks—like bytes or words—can be sped up quite a bit using bundled reads/writes. Let us see how.

Read the rest of this entry »

Affinities and ulimit

December 1, 2009

The Bash ulimit built-in can be used to probe and set the current user limits. Such limits include the amount of memory a process may use or the maximum number of opened files a user can have. While ulimit is generally understood to affect a whole session, it can be used to change the limits of a group of processes using, for example, a sub-shell.

However, the ulimit command is quirky (it expects a particular order for parameters and not all may be set on the same command line) and does not seems to be ageing all that well. For one thing, one cannot set the affinity of processes—indirectly controlling the number of and which cores one can use in a multi-core machine.

Read the rest of this entry »