Apparently, holding a warm beverage helps to promote an image of personal warmth. Well, if you want to do so, do it with style! Be noticed! Use the six-layered coffeetm!
The LP64 model and the AMD64 instruction set
28/10/2008Remember the old days where you had five or six “memory models” to choose from when compiling your C program? Memory models allowed you to chose from a mix of short (16 bits) and long (32 bits) offsets and pointers for data and code. The tiny model, if I recall correctly, made sure that everything—code, data and stack—would fit snugly in a single 16 bits segment.
With the advent of 64 bits computing on the x86 platform with the AMD64 instruction set, we find ourselves in a somewhat similar position. While the tiny memory model disappeared (phew!), we still have to chose between different compilation models although this time they do not support mixed offset/pointer sizes. The new models, such as LP32 or ILP64, specify what are the data sizes of int, long and pointers. Linux on AMD64 uses the LP64 model, where int is 32 bits, and long and pointers are 64 bits.
Using 64 bits pointers uses a bit more memory for the pointer itself, but it also opens new possibilities: more than 4GB allocated to a single process, the capability of using virtual memory mapping for files that exceed 4GB in size. 64 bits arithmetic also helps some applications, such as cryptographic software, to run twice as fast in some cases. The AMD64 mode doubles the number of SSE registers available enabling, potentially, significant performance enhancement into video codecs and other multimedia applications.
However one might ask himself what’s the impact of using LP64 for a typical C program. Is LP64 the best memory compilation model for AMD64? Will you get a speedup from replacing int (or int32_t) by long (or int64_t) in your code?
Everyday Origami
21/10/2008Ever found yourself with a CD or DVD without a sleeve to protect it? In this post, I present a fun and simple origami solution to the sleeveless DVD problem. While origami is often associated with sophistication and lots of spare time, it can serve in our daily lives, sometimes in surprising ways.
Origami, (from the japanese 折り紙, literally, folding (oru) paper (kami)), is a notoriously difficult art form where pieces of paper are folded—while avoiding cuts whenever possible—in various shapes of animals or other objects, an art sometimes pushed to incredible levels.
Spiking Makefiles with BASH
14/10/2008The thing with complex projects is that they very often require complex build scripts. The build script for a given project can be a mix of Bash, Perl, and Make scripts. It is often convenient to write a script that ensures that all the project’s dependencies are installed, of the right version, or that builds them if necessary.
We often also need much simpler things to be done, like generating a build number, from within the makefile script. If you use makefiles, you know just how fun they are to hack and you probably do the same as I do: minimally modify the same makefile you wrote back in 1995 (or “borrowed” from somewhere else.) In many cases, that makes perfect sense: it does just what it has to do. In this week’s post, I show how to interface (although minimally) Bash from Make.
Serialization, Binary Encodings, and Sync Markers
07/10/2008Serialization, the process by which run-time objects are saved from memory to a persistent storage (typically disk) or sent across the network, necessitate the objects to be encoded in some efficient, preferably machine-independent, encoding.
One could consider XML or JSON, which are both viable options whenever simplicity or human-readability is required, or if every serialized object has a natural text-only representation. JSON, for example, provides only for a limited number of basic data types: number, string, bool, arrays and objects. What if you have a binary object? The standard approach with text encodings is to use Base64, but this results in an 33% data expansion, since every 3 source bytes become 4 encoded bytes. Base64 uses a-z, A-z, 0-9, +, /, and = as encoding symbols, entirely avoiding comas (,), quotes (both single and double), braces, parentheses, newlines, or other symbols likely to interfere with the host encoding, whether XML, JSON, or CSV.
What if you do not want (or cannot afford) the bloatification of your data incurred by XML or JSON, and are not using a language with built-in binary serialization? Evidently, you will roll up your own binary encoding for your data. But to do so, one has to provide not only the serialization mechanisms for the basic data types (including, one would guess, the infamous “binary blob”) but also a higher-level syntax of serialization that provides for structure and—especially—error resilience.