Much Ado About Nothing

March 7, 2017

A rather long time ago, I wrote a blog entry on branchless equivalents of simple functions such as sex, abs, min, max. The Sing EXtension instruction propagates the sign bit in the upper bits, and is typically used in the promotion of, say, a 16 bits signed value into a 32 bits variable.

But this time, I needed something a bit different: I only wanted the sign-extended part. Could I do much better than last time? Turns out, the compiler has a mind of its own.

Read the rest of this entry »

Square Roots (part IV)

November 8, 2016

In a previous post, we noticed that

\sqrt{2^k} \leqslant \sqrt{n} = \sqrt{2^k+b} \leqslant \sqrt{2^{k+1}},

where 0 \leqslant b < 2^k, could be used to kick-start Newton's (or another) square root finding algorithm. But how fast can we find k and b in this decomposition?

Read the rest of this entry »

Of tails.

November 25, 2014

In a previous post, I explored the effect of pruning on a recursive function, namely, the Collatz function. Richard (see comment) asked “does your compiler know about tail recursion?”. Well, I didn’t know for sure. Let’s find out.


Read the rest of this entry »

More __builtins

January 14, 2014

Last week we discussed GCC intrinsics a bit. This week, let’s have a look at what kind of speed-ups we can get, and how the use of intrinsics affect code generation.


Sometimes, I do strange things. I mean, my experiments aren’t necessarily self-evident, and sometimes, I need performance from primitives that usually are not bottlenecks—such as computing the GCD. This time, I need to get k and b in n=2^k+b as fast as possible. Let’s have a look at how intrinsics help us.

Read the rest of this entry »

Is Python Slow?

November 10, 2009

Python is a programming language that I learnt somewhat recently (something like 2, 3 years ago) and that I like very much. It is simple, to the point, and has several functional-like constructs that I am already familiar with. But Python is slow compared to other programming languages. But it was unclear to me just how slow Python was compared to other languages. It just felt slow.


So I have decided to investigate by comparing the implementation of a simple, compute-bound problem, the eight queens puzzle generalized to any board dimensions. This puzzle is most easily solved using, as Dijkstra did, a depth-first backtracking program, using bitmaps to test rapidly whether or not a square is free of attack1. I implemented the same program in C++, Python, and Bash, and got help from friends for the C# and Java versions2. I then compared the resulting speeds.

Read the rest of this entry »