Finding Collisions

30/03/2012

Some time ago, a friend was trying to find an efficient way (storage and time complexity) to find collisions for a (secure) hash function. Hashing random keys until you get a collision is reminiscent of von Mises’ birthday paradox.

In is simplest form, the birthday paradox states that, amongst (randomly) gathered people, the probability that (at least) two share a birthday increases counter-intuitively fast with the number of gathered people. This mean that even if your hash function is random (i.e., strong), you may not have to try a very large number of random keys before finding collisions. Or does it?

Read the rest of this entry »


Drawing Random Numbers from Disjoint Ranges (Generating Random Sequences V)

20/03/2012

On a number of previous installment of this series, I’ve discussed permutations, generating uniform random points in a triangle, on a sphere, the inversion method, even recycling expensive random bits. In this entry, I will use the rejection method to generate random numbers from disjoint ranges.

For simplicity, let us consider only the uniform case, where all values are equally likely to be drawn. So instead of drawing a number x from a range, say [1,10], we’re interested in the case where the set is composed from several intervals, something like [1,10] \cup [20,30] \cup [40,50]. We may think of drawing uniformly from [1,50] and retry if we “fall in the gaps”, that is, if we draw a number in [11,19] or [31,39].

A first, it doesn’t seem like a bad plan, especially if the “holes” are rather small and easy to miss—that is, we have a good chance of hitting in an allowable range in a very few tries. For example, if the ranges we’re interested in look like [1,10] and [95,100], drawing from [1,100] really doesn’t sound like a good idea.

But what if we compacted the ranges?

Read the rest of this entry »


UEID (Unique Enough IDs, part 2)

13/03/2012

As part of an open-source project I’m working on (right now, we are still at the technical feasibility stage where we explore and eliminate technical risks, full disclosure will come later) we have to issue session numbers. They’re not session numbers in the usual sense, but they still need to be unique, and not amenable to simple attacks.

There are a couple of ways of generating unique session numbers. RFC 4122-compliant unique IDs is one possible way.

Read the rest of this entry »


Quadrature Encoding

06/03/2012

The nice thing about the trigonometric circle is that it is, quite so indeed, a circle. But it’s also a disadvantage when you consider the implementations of functions manipulating angles, especially \mathrm{mod}~2\pi.

This is particularly a problem when you’re not so much interested by a specific angle (which is at all time always well-defined) than by a series of angles varying in time. If you’re certain that your angle is always within \pm\pi or within [0,2\pi], you do not expect problems. What if, on the contrary the angle wraps around?

Read the rest of this entry »


Trigonometric Tables Reconsidered

28/02/2012

A couple of months ago (already!) 0xjfdube produced an excellent piece on table-based trigonometric function computations. The basic idea was that you can look-up the value of a trigonometric function rather than actually computing it, on the premise that computing such functions directly is inherently slow. Turns out that’s actually the case, even on fancy CPUs. He discusses the precision of the estimate based on the size of the table and shows that you needn’t a terrible number of entries to get decent precision. He muses over the possibility of using interpolation to augment precision, speculating that it might be slow anyway.

I started thinking about how to interpolate efficiently between table entries but then I realized that it’s not fundamentally more complicated to compute a polynomial approximation of the functions than to search the table then use polynomial interpolation between entries.

Read the rest of this entry »


Flac vs Lame

21/02/2012

In a previous post I discussed lossless audio encoding and presented a Bash script using flac to rip and process CD audio files. I also commented on how the psycho-acoustic model used in a MP3 encoder will dominate encoding as bit-rate increases, without much quantitative evidence. In this post, I present some.

Read the rest of this entry »


The Minimalist Calculus Cheat Sheet

14/02/2012

We do not know closed form solutions for all optimization problems, even when they are somewhat innoccent-looking. One of the many possible methods in such as case is to use (stochastic) gradient descent to iteratively refine the solution to the problem. This involves the computation of… yes, the gradient.

In its simplest form, the gradient descent algorithm computes the gradient of an objective function relative to the parameters, evaluated on all training examples, and uses that gradient to adjust the model’s parameters. The gradient of a (not necessarily objective) function f has the general form

Read the rest of this entry »


(Random Musings) On Floats and Encodings

31/01/2012

The float and double floating-point data types have been present for a long time in the C (and C++) standard. While neither the C nor C++ standards do not enforce it, virtually all implementations comply to the IEEE 754—or try very hard to. In fact, I do not know as of today of an implementation that uses something very different. But the IEEE 754-type floats are aging. GPU started to add extensions such as short floats for evident reasons. Should we start considering adding new types on both ends of the spectrum?

The next step up, the quadruple precision float, is already part of the standard, but, as far as I know, not implemented anywhere. Intel x86 does have something in between for its internal float format on 80 bits, the so-called extended precision, but it’s not really standard as it is not sanctioned by the IEEE standards, and, generally speaking, and surprisingly enough, not really supported well by the instruction set. It’s sometimes supported by the long double C type. But, anyway, what’s in a floating point number?

Read the rest of this entry »


Medians (Part III)

24/01/2012

So in the two previous parts of this series, we have looked at the selection algorithm and at sorting networks for determining efficiently the (sample) median of a series of values.

In this last installment of the series, I consider an efficient (but approximate) algorithm based on heaps to compute the median.

Read the rest of this entry »


Lossless Coding of CD Audio

17/01/2012

Once upon a time, I discussed how to pick bit-rate for MP3, while considering re-ripping all my CDs. But if I’m to re-rip everything, I might as well do it one last time and use lossless compression.

In this post, we’ll discuss the simple script I cooked up to do just that, and a bit on how Flac works, and how it compares to MP3.

Read the rest of this entry »