30/03/2012
Some time ago, a friend was trying to find an efficient way (storage and time complexity) to find collisions for a (secure) hash function. Hashing random keys until you get a collision is reminiscent of von Mises’ birthday paradox.

In is simplest form, the birthday paradox states that, amongst (randomly) gathered people, the probability that (at least) two share a birthday increases counter-intuitively fast with the number of gathered people. This mean that even if your hash function is random (i.e., strong), you may not have to try a very large number of random keys before finding collisions. Or does it?
Read the rest of this entry »
6 Comments |
algorithms, Mathematics | Tagged: Birthday, Birthday Paradox, Collision, Collisions, hash, Secure Hash, sha, von Mises |
Permalink
Posted by Steven Pigeon
20/03/2012
On a number of previous installment of this series, I’ve discussed permutations, generating uniform random points in a triangle, on a sphere, the inversion method, even recycling expensive random bits. In this entry, I will use the rejection method to generate random numbers from disjoint ranges.

For simplicity, let us consider only the uniform case, where all values are equally likely to be drawn. So instead of drawing a number
from a range, say
, we’re interested in the case where the set is composed from several intervals, something like
. We may think of drawing uniformly from
and retry if we “fall in the gaps”, that is, if we draw a number in
or
.
A first, it doesn’t seem like a bad plan, especially if the “holes” are rather small and easy to miss—that is, we have a good chance of hitting in an allowable range in a very few tries. For example, if the ranges we’re interested in look like
and
, drawing from
really doesn’t sound like a good idea.
But what if we compacted the ranges?
Read the rest of this entry »
Leave a Comment » |
algorithms, Mathematics, programming | Tagged: IP, IP v4, pseudo-random, random, ranges, uniform random |
Permalink
Posted by Steven Pigeon
13/03/2012
As part of an open-source project I’m working on (right now, we are still at the technical feasibility stage where we explore and eliminate technical risks, full disclosure will come later) we have to issue session numbers. They’re not session numbers in the usual sense, but they still need to be unique, and not amenable to simple attacks.

There are a couple of ways of generating unique session numbers. RFC 4122-compliant unique IDs is one possible way.
Read the rest of this entry »
2 Comments |
algorithms, bit twiddling, hacks, programming, Python | Tagged: RFC 4122, session, session id, SSL, urandom, UUID |
Permalink
Posted by Steven Pigeon
06/03/2012
The nice thing about the trigonometric circle is that it is, quite so indeed, a circle. But it’s also a disadvantage when you consider the implementations of functions manipulating angles, especially
.

This is particularly a problem when you’re not so much interested by a specific angle (which is at all time always well-defined) than by a series of angles varying in time. If you’re certain that your angle is always within
or within
, you do not expect problems. What if, on the contrary the angle wraps around?
Read the rest of this entry »
Leave a Comment » |
algorithms, machine learning, Mathematics, programming | Tagged: angle, π, Sphere, spherical coordinates, trigonometric circle, trigonometry, unit circle |
Permalink
Posted by Steven Pigeon
28/02/2012
A couple of months ago (already!) 0xjfdube produced an excellent piece on table-based trigonometric function computations. The basic idea was that you can look-up the value of a trigonometric function rather than actually computing it, on the premise that computing such functions directly is inherently slow. Turns out that’s actually the case, even on fancy CPUs. He discusses the precision of the estimate based on the size of the table and shows that you needn’t a terrible number of entries to get decent precision. He muses over the possibility of using interpolation to augment precision, speculating that it might be slow anyway.

I started thinking about how to interpolate efficiently between table entries but then I realized that it’s not fundamentally more complicated to compute a polynomial approximation of the functions than to search the table then use polynomial interpolation between entries.
Read the rest of this entry »
1 Comment |
algorithms, assembly language, C, C-plus-plus, C99, embedded programming, Mathematics, programming | Tagged: cos, cosine, MacLaurin Series, Polynomial, Polynomial Approximation, SIN, Sine, Taylor Series |
Permalink
Posted by Steven Pigeon
21/02/2012
In a previous post I discussed lossless audio encoding and presented a Bash script using flac to rip and process CD audio files. I also commented on how the psycho-acoustic model used in a MP3 encoder will dominate encoding as bit-rate increases, without much quantitative evidence. In this post, I present some.

Read the rest of this entry »
Leave a Comment » |
algorithms, Bash (Shell), data compression | Tagged: FLAC, Lossess Compression, Lossy Compression, MP3, psychoacoustic model, psychoacoustics |
Permalink
Posted by Steven Pigeon
14/02/2012
We do not know closed form solutions for all optimization problems, even when they are somewhat innoccent-looking. One of the many possible methods in such as case is to use (stochastic) gradient descent to iteratively refine the solution to the problem. This involves the computation of… yes, the gradient.

In its simplest form, the gradient descent algorithm computes the gradient of an objective function relative to the parameters, evaluated on all training examples, and uses that gradient to adjust the model’s parameters. The gradient of a (not necessarily objective) function
has the general form
Read the rest of this entry »
1 Comment |
algorithms, machine learning, Mathematics | Tagged: Calculus, Derivative, gradient descent, Intregral, Leibniz, objective function, optimization, stochastic gradient descent |
Permalink
Posted by Steven Pigeon
31/01/2012
The float and double floating-point data types have been present for a long time in the C (and C++) standard. While neither the C nor C++ standards do not enforce it, virtually all implementations comply to the IEEE 754—or try very hard to. In fact, I do not know as of today of an implementation that uses something very different. But the IEEE 754-type floats are aging. GPU started to add extensions such as short floats for evident reasons. Should we start considering adding new types on both ends of the spectrum?

The next step up, the quadruple precision float, is already part of the standard, but, as far as I know, not implemented anywhere. Intel x86 does have something in between for its internal float format on 80 bits, the so-called extended precision, but it’s not really standard as it is not sanctioned by the IEEE standards, and, generally speaking, and surprisingly enough, not really supported well by the instruction set. It’s sometimes supported by the long double C type. But, anyway, what’s in a floating point number?
Read the rest of this entry »
1 Comment |
algorithms, C, C-plus-plus, data compression, data structures, hacks, machine learning, programming | Tagged: double, extended, float, IEEE 754, octuple, quadruple |
Permalink
Posted by Steven Pigeon
24/01/2012
So in the two previous parts of this series, we have looked at the selection algorithm and at sorting networks for determining efficiently the (sample) median of a series of values.

In this last installment of the series, I consider an efficient (but approximate) algorithm based on heaps to compute the median.
Read the rest of this entry »
7 Comments |
algorithms, C, C-plus-plus, data structures, hacks, programming | Tagged: heap, max-heap, med-heap, median, min-heap, selection |
Permalink
Posted by Steven Pigeon
17/01/2012
Once upon a time, I discussed how to pick bit-rate for MP3, while considering re-ripping all my CDs. But if I’m to re-rip everything, I might as well do it one last time and use lossless compression.

In this post, we’ll discuss the simple script I cooked up to do just that, and a bit on how Flac works, and how it compares to MP3.
Read the rest of this entry »
1 Comment |
algorithms, Bash (Shell), data compression, embedded programming, Mathematics, programming | Tagged: CD, Entropy, Golomb Codes, GSM, Linear Prediction, music, psychoacoustic model, psychoacoustics, sound, Speech, Speech Codec |
Permalink
Posted by Steven Pigeon