Preparing lecture notes (or videos) sometimes brings you to revisit things you’ve known for a while, but haven’t really taken time to formalize properly. One of those things is fixed-point arithmetic.
The C99 <stdint.h> header provides a plethora of type definition for platform-independent safe code: int_fast16_t, for example, provides an integer that plays well with the machine but has at least 16 bits. The int_fastxx_t and int_leastxx_t defines doesn’t guarantee a tight fit, they provide an machine-efficient fit. They find the fastest type of integer for that machine that respects the constraint.
But let’s take the problem the other way around: what about defines that gives you the smallest integer, not for the number of bits (because that’s trivial with intxx_t) but from the maximum value you need to represent?
Evaluating polynomials is not a thing I do very often. When I do, it’s for interpolation and splines; and traditionally those are done with relatively low degree polynomials—cubic at most. There are a few rather simple tricks you can use to evaluate them efficiently, and we’ll have a look at them.
While this sounds something like a shameful family secret, discrete inversion is only the finite-valued variation on the method of inversion for the generation of random numbers with a given distribution (as I’ve discussed quite a while ago here). The case we’ll consider here is a random variable with few possible outcomes, each with different odds
Sometimes, it’s the little things that make a big difference. For example, moving small, not machine-sized, pieces of data may be a problem. If it happens to be machine-sized (that is, the size of something like uint32_t), then the copy is readily done by a single (integer) copy. But what if it’s, say, 5 bytes?
The first solution that comes to mind is either a for-loop or a call to a library function like memcpy. The for-loop would give something like:
char * dest = ... char * src = ... for (std::size_t i=0;i<5;i++) dest[i]=src[i];
If you’re lucky, the compiler understands the small copy and optimizes. More likely, it will merely unroll the loop and generate something like:
14d4: 0f b6 07 movzx eax,BYTE PTR [rdi] 14df: 88 04 24 mov BYTE PTR [rsp],al 14e2: 0f b6 47 01 movzx eax,BYTE PTR [rdi+0x1] 14e6: 88 44 24 01 mov BYTE PTR [rsp+0x1],al 14ea: 0f b6 47 02 movzx eax,BYTE PTR [rdi+0x2] 14ee: 88 44 24 02 mov BYTE PTR [rsp+0x2],al 14f2: 0f b6 47 03 movzx eax,BYTE PTR [rdi+0x3] 14f6: 88 44 24 03 mov BYTE PTR [rsp+0x3],al 14fa: 0f b6 47 04 movzx eax,BYTE PTR [rdi+0x4] 14fe: 88 44 24 04 mov BYTE PTR [rsp+0x4],al
Let’s see how we can fix that!
Some numbers are easier to work with than others, especially for computer arithmetic—and even more so with weak processors. So let’s have a look at easy numbers that we can sometimes exploit to get faster code.
Last week, we had a look at how to implement a trool, or a tri-valued boolean what accepts true, false, and undefined. We remarked that the storage of an enum likely defaults to int, and that my poc wouldn’t play well with std::vector as that container has no specialization to deal with this new type.
A specialization would be interesting because we can do much better than using an integer to store three different values. We can do much, much better.
Initializing arrays, or any variable for that matter, is always kind of a problem. Most of the times, you can get away with a default value, typically zero in C#C++, but not always. For floats, for example, NaN makes much more sense. Indeed, it’s initialized to not a number: it clearly states that it is initialized, consciously, to not a value. That’s neat. What about integers? Clearly, there’s no way to encode a NaI (not an integer), maybe std::numeric_limits::min(), which is still better than zero. What about bools?
Bool is trickier. In C++, bool is either false or true, and weak typing makes everything not zero true. However, if you assign 3 to a bool, it will be “normalized” to true, that is, exactly 1. Therefore, and not that surprisingly, you can’t have true, false, and maybe. Well, let’s fix that.
Recently on Freenode channel ##cpp, I saw some code using an include-all-you-can header. The idea was to help beginners to the language, help them start programming without having to remember which header was which, and which headers are needed.
Is that really a good idea?
About a week ago, some dude drops on IRC that he’s beat memcpy “by a lot”. That’d be interesting, except that we couldn’t get neither code nor test methodology out of him. But, how hard can making a better memcpy be? Turns out, harder than you think!
If you think this is a typical case of “reinventing the wheel”, I mostly agree with you. But while reinventing will be hard, can improvements be made?