Finding Collisions

March 30, 2012

Some time ago, a friend was trying to find an efficient way (storage and time complexity) to find collisions for a (secure) hash function. Hashing random keys until you get a collision is reminiscent of von Mises’ birthday paradox.

In is simplest form, the birthday paradox states that, amongst (randomly) gathered people, the probability that (at least) two share a birthday increases counter-intuitively fast with the number of gathered people. This mean that even if your hash function is random (i.e., strong), you may not have to try a very large number of random keys before finding collisions. Or does it?

Read the rest of this entry »


Hash Functions (checksums, part II?)

November 22, 2011

On a number of different occasions, I briefly discussed Hash Functions, saying that if a hash function needn’t be very strong if the intend is to use it for look-up, contrary to cryptographic applications. But, unsurprisingly, it’s rather difficult to get a good fast hash function.

Coming up with a very good hash function isn’t easy, but we can at least make an effort to build one by understanding what makes a better hash function, especially by explicitly testing them. Let us have a try at building a good hash function (for look-up) from scratch.

Read the rest of this entry »


The Complexity of Containers

October 11, 2011

One thing that came on topic rather often recently, and especially in connection with Python, was data structure complexity and what kind of performance operations should offer. However, it’s not as simple as it sounds. First, algorithm operations to be performed dictate, to a great extent, what container, or abstract data structure, should be favored.

But sometimes, these data structures lend themselves to alternate implementations that have different run-time and memory complexities. Maybe we should have a look.

Read the rest of this entry »



Data Insecurity

May 25, 2010

It always amazes me to see how people put trust in their service providers. While in principle, there’s no real need to worry, careless implementation of services can really have dire consequences!

And it’s not like leaks and exploits are rare. Sometimes we hear about them, sometimes we don’t. Let’s consider these two (amongst those) I know about:

Read the rest of this entry »


Checksums (part I)

July 21, 2009

I once worked in a company specializing in embedded electronics for industrial applications. In one particular project, the device communicated through a RS-422 cable to the computer and seemed to return weird data once in a while, causing unwanted behavior in the control computer whose programming did not provide for this unexpected data. So I took upon myself to test the communication channel as it seemed that the on-board software was operating properly and did not contain serious bugs. I added a check-sum to the data packet and it turned out that some packets came in indeed corrupted despite the supposedly superior electrical characteristics of the RS-422 link.

After a few days’ work, I implemented the communication protocol that could detect and repair certain errors while reverting to a request to retransmit if the data was too damaged. I then started gathering statistics on error rate, number of retransmit, etc, and the spurious behavior on the controller’s side went away. My (metaphorically) pointy-haired boss opposed the modification because “we didn’t have any of these damn transmission errors until you put your fancy code in there“. Of course, this was an epic facepalm moment. I tried to explain that the errors have always been there, except that now they’re caught and repaired. Needless to say, it ended badly.

abstract-0002

Notwithstanding this absurd episode, I kept using check-sum to validate data whenever no other layer of the protocol took care of transmission errors. So, this week, let us discuss check-sums and other error detection algorithms.

Read the rest of this entry »


Recycling Random Bits

June 9, 2009

It is not uncommon that for large-scale simulations you need a large volume of high-quality (pseudo)random numbers. The relatively short period of the C standard library’s rand function may make it impractical for your purpose; and you might need to resort to stronger generators such as the Linux-provided /dev/random and /dev/urandom pseudo-devices. But calling those is a relatively expensive process as /dev/random may stall until its “entropy pool” is repleted.

The /dev/urandom is the unblocked version of /dev/random, but it is also demonstrably less random. Moreover, one has to access either pseudo-devices through the file system, which in certain case introduces a noticeable impact on performance. Using other sources of randomness such as a hardware random number generator may also incur significant delays.

To avoid being stalled, I propose a simple method that will allow you to recycle bits from an expensive pseudo-random source without polling it too often.

Read the rest of this entry »


More Blinking Lights (and a disgression)

April 28, 2009

In Blinking Lights I told you about how I feel the modern computer for its exterior, except for its screen, is boring. When I look at my Antec case, I see a large, silent black box, which, by its very definition, is uninteresting at best. Something like a rock that slowly dissipates heat.

However Bill Buzbee built a computer that has an interesting exterior, and a much more interesting interior: the Magic-1. The Magic-1 is a computer running at 4.something MHz, and is in the same computational power range as the original 8086 4.77 Mhz IBM PC, except with a more advanced instruction set.

The Magic-1 Computer

The Magic-1 Computer

Read the rest of this entry »


UEID: Unique Enough IDs

September 30, 2008

Generating unique, unambiguous, IDs for data is something we do often, but we do not always know what level of uniqueness is really needed. In some cases, we want to be really sure that two instances of the same ID identify two copies of the same object or data. In other cases, we only want to be reasonably sure. In other cases, yet, we just assume that collisions—two different objects yielding the same ID—are very unlikely, and, if the need be, we can proceed to further testing to establish equality.

There are many ways of generating IDs, each with varying levels of confidence on uniqueness and differing applications.

Read the rest of this entry »


Follow

Get every new post delivered to your Inbox.

Join 41 other followers