Leafing through old notes, I found some loose sheets that must have been, at some time, part of my notes of my lectures on external data structures. I remember the professor. He was an old man—I mean old, well in his 70s—and he taught us a lot about tapes and hard drives and all those slow devices. As a youth, I couldn’t care less about the algorithms for tapes, but in retrospect, I now know I should have paid more attention. So, anyway, the sheets contains notes on how to compute average rotational latency and average (random) seek time.
Rotational latency is the time data takes to rotate under the hard drive read head. The disk rotates in only one way, (from right to left in the figures) and quite independently of the user’s needs. If you’re lucky and it so happens that the current sector is the one you want to read, you have “zero” delay. If the sector you want to read just passed, you have to to wait for a full rotation until it comes into reach again. For the sake of example, let us consider a ten sector per track hard drive. Then, in our case, nine sectors (excluding the one currently under the read head) would have to pass to get the data we want.
Say that the current sector is sector 3 (from 0 to 9), the read distance to all sectors is given by:
If the current sector is sector 6, then the read distance looks like this:
So, whatever the current sector, the distance map always contains distances 0 to 9, only in a cyclic permutation. The average case is therefore given by the following. First, we have:
If we start at zero. If we want to count the current sector as having a wait of 1, it becomes:
The average wait is then:
The actual delay (in wall-clock time) depends on the actual rotation speed, but that’s an implementation detail. In the figures the disk had ten sectors per track, but in real hard drives, it may be much higher.
The other interesting metric is the average track to track random seek time. While reading files, the hard drive head must move across tracks. In a real file-system, displacements are not random at all. They depend on how the file system places data, and it normally tries to make most of the files contiguous on disk. That means that a file may spread across many tracks, but many of the tracks will be adjacent so a sequential read will only ask the disk to move the read head to the next track—a jump of exactly one track—when the current track is done reading. Realistic models of disk accesses given a specific file system are of course quite complex… or at least not that self-evident.
But we can guess what happens if you
run windows suppose that a file system places data uniformly randomly on the disk; the worst possible case.
So what would such a displacement look like? If we take the same kind of diagram we used for sectors for tracks; we get a distance map like this one:
Here we assume that the disk has 10 tracks (OK, let’s say it’s 1984 all over again) and the read head is over track 4—starting at zero again. The figures gives the number of hops it takes to reach a given track from track 5. Would have we had the read head over track 2, the distance map looks like:
So this time it’s not merely a rotation of zero to nine, it’s something else. If we look at all possible read head positions we get the following distance maps:
We start to see a pattern here! Packing all those, we get the following:
which is a band matrix. Elements on diagonals are all the same:
And the lower triangular part is the same as the upper triangular part:
So to compute the average (random) seek time, we have to compute the sum of the matrix divided by (because there are possible seeks, including seeks of 0 displacement; if you want to exclude those, divide by instead). Since both halves of the matrix are symmetric, their sum is necessarily the same. So twice the sum of either triangular part is the sum of the whole matrix.
The sum is given by:
The inner sum gives the sum of the th column. The outer sum computes the sums of the sum of each column. Let us simplify this expression:
Using the identity:
Using the identity:
And for the average:
…whenever is large.
So is the average worst case for seeks. That’s pretty bad, and that’s also a pretty coarse upper bound. Real file system will reorder reads and write to minimize seek time on average. How it’s done is file system specific. I wonder how ReiserFS or Ext4 approach this problem…