Leafing Through Old Notes

Leafing through old notes, I found some loose sheets that must have been, at some time, part of my notes of my lectures on external data structures. I remember the professor. He was an old man—I mean old, well in his 70s—and he taught us a lot about tapes and hard drives and all those slow devices. As a youth, I couldn’t care less about the algorithms for tapes, but in retrospect, I now know I should have paid more attention. So, anyway, the sheets contains notes on how to compute average rotational latency and average (random) seek time.

Rotational latency is the time data takes to rotate under the hard drive read head. The disk rotates in only one way, (from right to left in the figures) and quite independently of the user’s needs. If you’re lucky and it so happens that the current sector is the one you want to read, you have “zero” delay. If the sector you want to read just passed, you have to to wait for a full rotation until it comes into reach again. For the sake of example, let us consider a ten sector per track hard drive. Then, in our case, nine sectors (excluding the one currently under the read head) would have to pass to get the data we want.

Say that the current sector is sector 3 (from 0 to 9), the read distance to all sectors is given by:

If the current sector is sector 6, then the read distance looks like this:

So, whatever the current sector, the distance map always contains distances 0 to 9, only in a cyclic permutation. The average case is therefore given by the following. First, we have:

$L_0 = \sum_{i=0}^{n-1}i = \displaystyle\frac{1}{2}n(n-1)$

If we start at zero. If we want to count the current sector as having a wait of 1, it becomes:

$L_1 = \sum_{i=1}^{n}i = \displaystyle\frac{1}{2}n(n+1)$

The average wait is then:

$\bar{L}_0 = \displaystyle\frac{1}{n}\displaystyle\frac{1}{2}n(n-1)=\displaystyle\frac{1}{2}(n-1)$

or:

$\bar{L}_1 = \displaystyle\frac{1}{n}\displaystyle\frac{1}{2}n(n+1)=\displaystyle\frac{1}{2}(n+1)$

The actual delay (in wall-clock time) depends on the actual rotation speed, but that’s an implementation detail. In the figures the disk had ten sectors per track, but in real hard drives, it may be much higher.

*
* *

The other interesting metric is the average track to track random seek time. While reading files, the hard drive head must move across tracks. In a real file-system, displacements are not random at all. They depend on how the file system places data, and it normally tries to make most of the files contiguous on disk. That means that a file may spread across many tracks, but many of the tracks will be adjacent so a sequential read will only ask the disk to move the read head to the next track—a jump of exactly one track—when the current track is done reading. Realistic models of disk accesses given a specific file system are of course quite complex… or at least not that self-evident.

But we can guess what happens if you run windows suppose that a file system places data uniformly randomly on the disk; the worst possible case.

So what would such a displacement look like? If we take the same kind of diagram we used for sectors for tracks; we get a distance map like this one:

Here we assume that the disk has 10 tracks (OK, let’s say it’s 1984 all over again) and the read head is over track 4—starting at zero again. The figures gives the number of hops it takes to reach a given track from track 5. Would have we had the read head over track 2, the distance map looks like:

So this time it’s not merely a rotation of zero to nine, it’s something else. If we look at all possible read head positions we get the following distance maps:

We start to see a pattern here! Packing all those, we get the following:

which is a band matrix. Elements on diagonals are all the same:

And the lower triangular part is the same as the upper triangular part:

So to compute the average (random) seek time, we have to compute the sum of the matrix divided by $n^2$ (because there are $n^2$ possible seeks, including seeks of 0 displacement; if you want to exclude those, divide by $n^2-n = n(n-1)$ instead). Since both halves of the matrix are symmetric, their sum is necessarily the same. So twice the sum of either triangular part is the sum of the whole matrix.

The sum is given by:

$S_n=2\sum_{i=1}^{n} \sum_{j=0}^{n-1}j$

The inner sum gives the sum of the $i$th column. The outer sum computes the sums of the sum of each column. Let us simplify this expression:

$S_n=2\sum_{i=1}^{n} \sum_{j=0}^{n-1}j$

$S_n=2\sum_{i=1}^{n} \displaystyle\frac{1}{2}i(i-1)$

Using the identity:

$\sum_{i=1}^{n} i= \displaystyle\frac{1}{2}n(n+1)$

We get:

$S_n=2\sum_{i=1}^{n}\displaystyle\frac{1}{2}i(i-1)$

$S_n=\sum_{i=1}^{n} i(i-1)$

$S_n=\sum_{i=1}^{n} i^2-i$

$S_n=\sum_{i=1}^{n} i^2 - \sum_{i=1}^{n}i$

Using the identity:

$\sum_{i=1}^{n} i^2 = \displaystyle\frac{1}{6}n(n+1)(2n+1)$

We continue:

$S_n=\displaystyle\frac{1}{6}n(n+1)(2n+1)-\displaystyle\frac{1}{2}n(n+1)$

$S_n=n(n+1)\left(\displaystyle\frac{1}{6}(2n+1)-\displaystyle\frac{1}{2}\right)$

$S_n=\displaystyle\frac{1}{6}n(n+1)\left(2n+1)-3\right)$

$S_n=\displaystyle\frac{1}{6}n(n+1)(2n-2)$

$S_n=\displaystyle\frac{1}{6}n(n+1)2(n-1)$

$S_n=\displaystyle\frac{1}{3}n(n+1)(n-1)$

$S_n=\displaystyle\frac{1}{3}n(n^2-1)$

$S_n=\displaystyle\frac{n(n^2-1)}{3}$

And for the average:

$\displaystyle\frac{S_n}{n^2}=\displaystyle\frac{n(n^2-1)}{3n^2}$

$\displaystyle\frac{S_n}{n^2}=\displaystyle\frac{n^2-1}{3n}$

$\displaystyle\frac{S_n}{n^2}=\displaystyle\frac{1}{3}\left(n-\displaystyle\frac{1}{n}\right)\approx\displaystyle\frac{1}{3}n$

…whenever $n$ is large.

So $\sim \displaystyle\frac{1}{3}n$ is the average worst case for seeks. That’s pretty bad, and that’s also a pretty coarse upper bound. Real file system will reorder reads and write to minimize seek time on average. How it’s done is file system specific. I wonder how ReiserFS or Ext4 approach this problem…

7 Responses to Leafing Through Old Notes

1. pete says:

It’s 1984, I’m a veteran int 13h programmer, and I’m interleaving now.

But back in the 21st Century, LBA-addressing prevents an OS from addressing this problem (not that any I know of ever did), so we make files contiguous, read ahead, and leave the problem to the hard disk and its gratuitous on-controller cache. Incidentally, that means reading a file sequentially is faster than accessing it randomly or via memory-mapping. There’s also a lesson about solving problems at the right abstraction layer: having the hard disk controller optimise disk access, rather than trying to fix it at the file system layer.

• Steven Pigeon says:

Well, yes, and no.

(First, I must say the notes date back from the late 80s, or maybe 1990, a lot of things remained to be invented.)

And, no, the on-controller disk cache is minuscule compared to what you may want from your drive. Even if it has 128 MB of cache (which, of course, none have) you will want to place large files in a contiguous mapping, logical and physical. Your cache, or even logical block mapping, can’t help you with a stupid file system.

Otherwise, why bother with a defragmentation utility ?

2. Cyril PINEAU says:

Very interesting post Steven … And I am, once will not hurt, able to understand.

Some features of ext4 can clearly minimize this problem by improving the decision phase with the delayed allocation and the multiblock allocator.
However, the inclusion within the file system of algorithms designed to minimize fragmentation continually rearranging the data seems to be a good approaches too… as Apple had begun to do with HFS +?

3. Steven Pigeon says:

In fact, I think you’d want the file system to be as smart as possible about file placement. Furthermore, a lot of intelligence can be built in the file system that cannot be built effectively in the drive itself. For example, the file system could use access data to decide whether or not it is worthwhile to defrag a file (by moving it in a region where it is contiguously stored) or whether it doesn’t matter because rarely accessed.

I don’t know much about HFS, so can’t comment on this.

4. lemonleopard says:

(Well I was only tapping out my first basic program in 1984; it wasn’t till three or four years later that I was accidentally wiped my hard disk with int 13h. Incidentally it is quite enlightening to address a floppy controller directly on an 8-bit micro and having to manually scan for the byte-signature that delimits a sector’s start.)

Anyway, I think you’ve misunderstood. My point is that that, yes, we do defrag, because the controller ensures contiguous access will be fastest; so good filesystems aim to keep things contiguous. What filesystems don’t worry about is the the order of individual sectors and how they are arranged and how to optimise access to that; those kind of details are taken care of by the hard disk controller. In fact as we head towards flash drives, or hard disks that contain flash drives, designing file-systems specifically for the short comings of spinning platters seems an anachronism. And, as I said, I can see this problem again and again in my code, where I have made the filesystems juggle sectors rather than abstracting it.

PS I still think having a cache on a disk is revolutionary, let alone ones that can be measured in MB.

• Steven Pigeon says:

Yes, I agree, mass storage that behaves more like RAM will mean that some of the rotating-thingie-based decisions in file systems will be obsolete. Maybe, even, it’ll be more efficient to scatter-write, who knows?

5. lemonleopard says:

PS: I’m pete from above, I’ve just signed up for a blog, and, oops, have suffered form unintended context collapse.