rdrand

November 4, 2019

Last week (at the time of writing, anyway), Ars Technica reported a serious bug in AMD’s implementation of rdrand, an instruction that helps you generate random numbers. Apparently, on (some) Ryzen 3000, 0xfff…ff is “random”.

I recently got an AMD Ryzen 9 3900x, and I wondered if I have the bug as well.

Read the rest of this entry »


Yet More __builtins

January 21, 2014

So last week we saw how to use some of GCC’s built-ins, this week, let’s have a look at how we can create our own, if need be. Say because you need to have access to some instruction and that GCC does not offer the corresponding built-in.

coin-reverse-small

To do so, we’ll use a bit of the C preprocessor and GCC’s inline assembly extension.

Read the rest of this entry »


GCC Built-ins

January 7, 2014

In the discussion of The Speed of GCD, Daniel Lemire remarked that one could use the compiler-specific intrinsic function __builtin_ctz to get a good speed-up on binary GCD. That remark made me look into all the others intrinsics and built-ins offered by GCC.

amonite

Let’s have a look to at least a few of them!

Read the rest of this entry »


Faster Collatz

May 1, 2012

Quite a while ago, I presented the Collatz conjecture and I was then interested in the graphical representation of the problem—and not really going anywhere with it.

In this entry, let us have a look at the implementation of the Collatz function.

Read the rest of this entry »


Trigonometric Tables Reconsidered

February 28, 2012

A couple of months ago (already!) 0xjfdube produced an excellent piece on table-based trigonometric function computations. The basic idea was that you can look-up the value of a trigonometric function rather than actually computing it, on the premise that computing such functions directly is inherently slow. Turns out that’s actually the case, even on fancy CPUs. He discusses the precision of the estimate based on the size of the table and shows that you needn’t a terrible number of entries to get decent precision. He muses over the possibility of using interpolation to augment precision, speculating that it might be slow anyway.

I started thinking about how to interpolate efficiently between table entries but then I realized that it’s not fundamentally more complicated to compute a polynomial approximation of the functions than to search the table then use polynomial interpolation between entries.

Read the rest of this entry »


Being Shifty

July 27, 2010

Hacks with integer arithmetic are always fun, especially when they’re not too hard to understand (because some are really strange and make you wonder what the author was thinking when he wrote that). One such simple hack is to replace divisions or multiplies by series of shifts and additions.

However, these hacks make a lot of assumptions that aren’t necessarily verified on the target platform. The assumptions are that complex instructions such as mul and div are very slow compared to adds, right shifts or left shifts, that execution time of shifts only depends very loosely on the number of bit shifted, and that the compiler is not smart enough to generate the appropriate code for you.

Read the rest of this entry »


Is Python Slow? (Part II)

June 8, 2010

In a previous post I expressed my worries about Python being excruciatingly slow and I used a toy problem to compare the speed of Python to programs in other several languages, including C.

Of course, all kind of people complained that I couldn’t compare a dynamic, interpreted language with static, compiled languages. First, let met tell you that I sure can. First, the goal was to measure speed, and not the effects of type system of the language (although logically correlated) nor the programming paradigm: the amount of CPU used to solve a given problem was the primary (if not only) point in interest.

But to be fair to Python, I extended the tests to other interpreted, dynamic languages, such as Lua, Perl, PHP and JavaScript. I also added Pascal and Haskell in the compiled languages groups.

Read the rest of this entry »