## Is Python Slow? (Part II)

In a previous post I expressed my worries about Python being excruciatingly slow and I used a toy problem to compare the speed of Python to programs in other several languages, including C.

Of course, all kind of people complained that I couldn’t compare a dynamic, interpreted language with static, compiled languages. First, let met tell you that I sure can. First, the goal was to measure speed, and not the effects of type system of the language (although logically correlated) nor the programming paradigm: the amount of CPU used to solve a given problem was the primary (if not only) point in interest.

But to be fair to Python, I extended the tests to other interpreted, dynamic languages, such as Lua, Perl, PHP and JavaScript. I also added Pascal and Haskell in the compiled languages groups.

The toy problem this time is to compute the factorial of a given number using recursion and successive addition. This is a most inefficient way of computing a factorial but stresses out three major aspects of the languages being tested: simple function calls, loop control, and their ability to perform simple integer arithmetic. The C99 version of the function reads as follows:

```uint64_t facto(int n)
{
if (n==1)
return 1;
else
{
uint64_t s=0;
for (int x=0;x<n;x++)
s+=facto(n-1);
return s;
}
}
```

So, it’s clearly something simple, no fancy polymorphism, no dynamic typing, just the stuff one would use as building blocks for (real) algorithms.

The languages considered, in addition to Python 2.6.4:

Lua. The very pascal-ish Lua allows a very direct translation of the C version. Lua mixed-paradigm programming makes it easy to learn for imperative- and object-oriented programmers. Typical applications of Lua seems to be extension scripting for games, but is used in other applications as well. Note: No, I did not try Lua-JIT, because as the time of writing it is not available for 64 bits platform. It is still in beta. Version 5.1.4 was used.

JavaScript. I know nothing about JavaScript and my friend Félix Morency contributed the JavaScript translation for this experiment. The difficult thing about testing JavaScript is that there are so many implementations: Firefox, IE, Chrome, and others. I arbitrarily chose, and out of convenience, to use Firefox’s 3.5.9 implementation.

Perl. Another language commonly used for shell and application scripting, known as the “duct tape of the Internet”. Like the camel, its de facto mascot, Perl is a curiously misshaped monster perfectly adapted to its environment—the only animal that would have been more ironic than the camel is the platypus. Or the hypocamptopus. Perl 5.10.0 was used.

PHP. PHP is another popular language for Internet-scale applications such as Wikipedia, YouTube and countless others [ref]. Roughly comparable to Perl in many aspect, it however suffers, in my opinion, from a very weirdly designed set of libraries—but that may be the subject of another entry. PHP 5.2.10 with Zend 2.2.0 was used.

Bash. Bash is the default shell for Debian-based distributions and makes it in many ways a natural first choice for glue-code programming. Bash isn’t incredible subtle, but gets the job done. When you get to know it, it is particularly easy to create pipes between processes. It is, however, painfully slow. Bash v. 4.0.33(1) was used.

Haskell. A purely functional programming language, Haskell is an evolution of Miranda, a programming language that I used for a long time. The basic paradigm—functional programming— makes that “algorithms, data structures or programming languages that exclude destructive modifications (updates). According to this restriction, variables are used in a mathematical sense, with identifiers referring to immutable, persistent values.” [ref:Wikipedia] So programming in Haskell is not much more difficult that writing recurrences on paper. What may be the most alien to imperative-style programmers is that recursion is the main mechanism for iterations. Lazy evaluation may also cause performance surprises, as it is not always straightforward to know that the compiler will do. The Glorious Glasgow Haskell compiler v 6.10.4 was employed.

C. Primarily used for high-performance, low-level programming. One might say C is merely a fancy macro-assembler, but that would be unjust; one has to know what the language is about: giving the right amount of abstraction and control to programmers. Gcc 4.4.1 was used.

C++. The object oriented evolution of C. It carries most of the weaknesses from C but also offers powerful tools such as object-oriented programming and meta-programming. Having high-level tools while being able to balance between abstraction and performance in a controlled way makes C++ my personal favorite. G++ 4.4.1 was used.

Pascal. Pascal was a long time favorite of mine, especially in its Turbo Pascal incarnations which brought, with version 5.5, in early 1989, if I remember correctly, object-orientation to a mainstream Pascal compiler for the first time. Of course, object-orientation changed a lot of things for the better. Pascal is still elegant despite its fall in popularity. Looking back at my 90s-era code I do appreciate more C’s terseness. The Free Pascal Compiler 2.2.4-3 was used.

*
* *

About JITs. People keep suggesting me to use the available JITs for language X or language Y. Unfortunately, not all JITs are available. First, you require a JIT port for your OS. Python JIT are either incomplete but getting there (such as the Shed Skin project, which is my personal favorite), or does not support x86_64, which is unfortunately my platform, or are not in my distribution’s repositories, either because they failed the integration testing or for some other reason. The first two reasons are show-stoppers: if it’s not available for my platform or supports only a sub-set of the language, that’s pretty much the end of it. The third one is also a show stopper, although you can debate whether it is. At the 2009 Linux Symposium, co-author François-Denis Gonthier and I explained why tar-ball installs aren’t the solutions (amongst other things) [Slides, OpenOffice format]. That is, you can install from tar balls, but you can’t get the automagic updates and you have to deal with dependencies which can, also, have dependencies outside the repositories. Something that breaks every time you update. So, official repositories or GTFO!

*
* *

So I ran the above code and measured time to completion within the programming language itself, excluding initial start-up times. The results are shown in the next two graphs. Notice the log-scale for the y axis:

Absolute (log-)Times

Relative (log-)Times

The results are meaningful starting at problem size 8, or so, because execution times exceed significantly timer resolution. As with the 8 queens chess problem, Bash comes last, exceedingly slower than all the other languages, about 200000× slower than C, the reference implementation. The last 3 points for Bash are extrapolated (otherwise I’d still be waiting for the results).

Then comes Perl, about 300× slower than C. That comes as a surprise because I never heard complaining about Perl being slow (but about a lot of other things.) Then comes PHP, merely 150-170× slower than C. Then comes Python at 140× slower. Big surprise, JavaScript is twice as fast as Python, clocking at 70× slower than C, even on an implementation that is known to be troublesome.

Lua (sans JIT) comes the big champion at 45× slower than C, that is, about twice as fast as JavaScript, three times faster than Python, and 4200× faster than Bash. One thing I noticed that influences the performance of Lua—although not in this particular experiment—is the use of aliases. Lua allows you to bring a specific library symbol into current scope using local func=thislib.func. In another experiment (topic of a future post), speed-ups just from local aliases were in the order of 20-30%.

For some reason, C++ is a wee bit faster (5-10%) than the C implementation, despite being the same source code for the function. The Free Pascal Compiler also yielded good code, being within 30% (slower) than the C version.

Haskell isn’t shown on the charts, if you noticed. Since recursion is its primary iteration mechanism, Haskell knows a lot about recursive functions. It knows so much, indeed, that the compiler is smart enough to figure out what the function does and replaces it by a direct computation of $n!$, which is damn impressive. In all cases, the times remains constant and doesn’t explode like the other languages.

```rep x n
| n==1 = [x]
| otherwise =  x : rep x (n-1)

facto n
| n==1 = 1
| otherwise = sum ( rep (facto (n-1)) n )

main = do
args <- getArgs
let n = ( read \$ ( head args ) ) :: Int
print . facto \$ n
```

Also, you may have noticed that the curves for Pascal and JavaScript are the same for the first few problem sizes. That’s because their timer is too coarse and simply return 0 as elapsed time. A default value of 0.0001s was used to fill the gaps. Some values also have been extrapolated because it would otherwise take zillions of years to complete. The details are given in the gnumeric spreadsheet you can download (see the end of the post).

*
* *

So Python isn’t that slow, after all. Perl and PHP are much slower, but JavaScript and Lua are much faster. That still raises many questions. One is about ease of coding and how a language helps you get correct code faster. Neither does Perl nor PHP help you on this; yet they’re very popular. Python is probably better. I can’t really comment on Lua as of now because I haven’t programmed enough with it to draw any kind of worthwhile conclusions. Clearly C and C++ do not help you much either, they’re rather unforgiving. So, basically, I don’t find the argument of using a language that’s “easier” all that convincing.

The other question is how the use of a given language impacts your infrastructure. If you have a language that is, once all other effects such as I/O wait are factored out, five times slower than another, how many more servers do you need for your application? Maybe not five times as much, but probably at least twice. In other words, investing a bit more in development time may be worthwhile in the long run because you’ll save on hardware and on all that comes with it: floorspace, power, cooling.

Maybe the better strategy is to combine both slow and “easy” languages and fast and “hard” languages. Having your first iterations in Python or PHP may be quite sufficient for your initial needs. As times goes by, you progressively replace pieces of your software by pieces programmed in fast and hard languages. This strategy may help you being competitive because, in the end, it’s not always possible to ‘just slap more hardware in there’.

*
* *

You can download the spreadsheet (in gnumeric format) and the source code here. In the gnumeric spreadsheet, pale red squares means that the value is either interpolated or set arbitrarily (for example, if a program returns 0 elapsed time because of too coarse a timer, it received a default value).

### 14 Responses to Is Python Slow? (Part II)

1. A couple of notes of caution regarding C:
– gcc4.4 generates code that’s ~25% slower than msvc10 one

– without additional persuasion msvc10 does a smart thing – it recognizes facto as a pure function, calls it once, and then adds the result in a loop. This optimization is performed only in the external call, the body of facto function behaves as specified, i.e. N recursive calls to facto. For people who’ll compare MSVC and other languages with this code – expect scary results.

2. Steven Pigeon says:

There is also ICC (the Intel C/C++ Compiler) and LLVM that one might want to check out. I could have checked Microsoft Visual C++, but the thing is, that’s going a lot out of my way.

I am sure Haskell does something like this: either it recognizes the calls and collapsed them into a factorial, either it memoized the results of facto as being a pure function: result: instantaneous results.

When you say ~25% slower, you mean for this particular example? Or in general (I would find that hard to believe if in general)

3. Félix C. Morency says:

I did some tests and LuaJIT2-beta4 seems to be 2.6x faster than Lua5.1.

• Steven Pigeon says:

Can you provide a bit more information? Like, give the timings for at least ~10 points?

4. Systemfault says:

Clang++ (svn trunk from a week ago) gives some interesting results for the C++ code. I haven’t analyzed the generated assembly but it seems that it caches the result of the first facto computation and just reuse it(A bit like Haskell would do) so it gives the result instantly for any N. That compiler never ceases to impress me.

13 6227020800 0.000000
14 87178291200 0.000000
15 1307674368000 0.000000

Nice post,
Roger.

5. Colin Alston says:

Your Python code itself is less than optimal I’ve noticed. Granted, Python is not optimised for big numeric operations which is why NumPy and other optimised extensions exist. On the other hand, you’ve not made any use of generators and other time saving Python idioms.

Yes, if you try to write C code in Python, it will be slower. Much like if you try to write Python code in C, you’ll give up for commit suicide.

• Steven Pigeon says:

The most efficient way of computing the factorial would be to use a product, and the most efficient, pythonèsque way of computing the factorial would be something like reduce(operator.mul, xrange(1,n+1)), which is instantaneous.

Already replacing the recursion by return sum([facto(n-1)]*n) yields an incredible speed-up because facto(n-1) is called once and replicated n times, short-cutting most of the recursion, reducing the number of operations from $O(n!)$ to something of $O(n^2)$.

However, using other pythonèsque constructs that remain in line with the original intent like, say, return sum( [ facto(n-1) for x in xrange(1,n+1) ] ) yields worst performance than the original code. Something like 3s vs 2.6s for n=10.

The code was deliberately constructed so that it would yield $O(n!)$ calls and operations.

6. […] C++, and that’s some what new for me. Of course, I’ve criticized Python’s dismal performance on two occasions, getting me all kind of comments (from “you can’t compare performance […]

7. Gabbar Singh says:

ksh is ~100 times faster than bash:

```#!/bin/ksh

solutions=0

facto()
{
typeset n=\$1
if [ \$n -gt 1 ]
then
typeset -i x
typeset -i s=0
for ((x=0; x&lt;n; x++))
do
s=\$((s+\$( facto \$((n-1)) ) ))
done
echo \$s
else
echo 1
fi
}

start=\$(date +&quot;%s.%N&quot;)
solutions=\$(facto \$1)
stop=\$(date +&quot;%s.%N&quot;)

printf &quot;%d %d %0.6f\n&quot; \$1 \$solutions \$(echo \$stop - \$start | bc )

exit 0
```
• Steven Pigeon says:

Not surprised. Bash is incredibly slow. Wonder why, though, since it’s at the core of a lot of things (like install scripts, configuration scripts, and many system-level scripts).

8. Gabbar Singh says:

Had ksh, with it’s many improvements when compared to bash, been adopted by more folks as their scripting shell, it’s possible that perl would never have come about, saving many of us vast amounts of pain.

I was playing with some non-algorithmic optimization of the ksh code, and the most puzzling improvement was a 10% gain in speed when not assigning \$1 to n, but just using \$1 inline.

9. Meh says:

I don’t really think this is true as much anymore, comparing static and dynamic languages is like comparing apples and oranges. There will always be some minute detail that separates the two. In the long run I think if you consider the fact that nothing on computers can exist without machine code that argument become quite silly. As another fact if we went that route might as well have your team writing in assembly, unless you know something faster than being closer to the metal.

• In fact, the problem is that some languages, like python, just can’t have an implementation close to the machine. If any construct requires that you look up a function definition everytime because it might have changed, it’s not so much a problem of dynamic vs static, or compiled vs interpreted, it’s a problem about the langage being too fancy for its own good. If you’re going to allow function redefinition, there’s nothing that prevents you from doing so in a sane fashion. For example, look it up once when the scope is created, and make it just too bad for you if you decide to redefine it while in that scope, you’ll use the previous definition while the scope lives.

Writing everything is assembly is essentially a strawman argument. While it’s really a fun dream that every (high level) programming language statement translates directly to a few machine instructions, the reality is that python is still several HUNDRED times slower than a compiled language. The only langage I know that is worse is Bash. But that’s not the point.

How does python scale? By adding computers. But not 2-3 times more computers. 100s more. That’s the point.

Not that python is slow so lolol the printing dialog takes 100ms more to appear. It’s that python is so slow it *is* a problem for application scaling. And it’s not something I made up. Python is slow. It has a bad multithreading model. It is a problem.

10. I’ve just ported the code to Julia (http://julialang.org), Julia is a high-level, high-performance, dynamic programming language with a syntax similar to Ruby, Python, Matlab code and uses a JIT compiler based on LLVM:

* source code, here: http://git.io/v0O7z (see comment there)

The test ran at https://juliabox.org (Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz) using Julia v0.5+, it’s fast! 0.000028
8 => 0.000108
9 => 0.000463
10 => 0.002146
11 => 0.010646
12 => 0.057603
13 => 0.334600
14 => 2.055078
15 => 13.480449
16 => 97.192552
17 => 720.314676

It’s still a young language but it’s modern, already fast, extremely well designed and actively developed! (and other high level features like true hygenic lisp like macros, parallel and concurrent programming, etc…).