Even expensive and top-of-the-line hardware is fallible. Last night (at the time of writing) my main workstation’s PSU burned. I mean, not soft-failed and powered down, I mean burned. With the acrid smell filling the room, I knew something went very wrong the instant I entered my study. I found my computer powered down, non-responsive. I wasn’t too worried because I knew that even if the computer went dead for good, I would not loose much data since, you know, I have backups.
Are you capable of surviving your own little Data Hiroshima?
So I after a few quick tests, I gather that the computer is dead for good. I don’t know the extend of the damage, but by the smell, it doesn’t look good. Inspecting the computer with a flashlight (well, yeah, maybe I watched too many CSI episodes) I gather that the motherboard and other components are not damaged—no burn marks and no exploded capacitors. So I remove the drives and get to a local computer shop to get a new power supply unit to be installed.
When I got the computer back, I ran an extensive memory test to make sure the RAM survived. I then had the S.M.A.R.T. hard drives to self-test using smartctl tools (and they’re already in the Ubuntu repositories). Apparently, the computer survived the failure with no detectable problems. However, these tests take a good while (several hours) so I disassembled the faulty PSU to get a good look at it in the mean time.
Not a very nice sight.
Years ago, when I was still using Windows, I would have been very unnerved by this event because 1) you cannot easily transfer a Windows OS from one computer to the other by simply exchanging drives and 2) I wasn’t too hot on redundancy and backups.
The first problem is largely solved by Ubuntu. Exchanging a drive from a machine to the other of same general architecture (e.g., from AMD64 to AMD64) results in very little problems: change the Ethernet adapter number from eth0 to eth1 and possibly change the graphics drivers (using sudo dpkg-reconfigure xserver-xorg). Otherwise, your OS will reboot just fine. Unlike Windows which will probably BSOD if drivers are missing or fail to initialize. I have no idea what a hard drive transplant does to the Windows Genuine Advantage software, but I’m sure you won’t like it.
The second problem is solved by my systematic backing up of my data. You may also wonder right now just how reliable your data backup strategy is. Let us consider a few:
- Copies of files on the same disk. Somewhat protects you from bad manipulations as the file can be recovered from a different location on the same disk. If the disk fails, you loose everything.
- Copies of files on different disk(s), same computer. Provides some protection against disk failure. RAID arrays (with many different levels) offer you protection against individual drive failure by using redundancy spread across many drives in the same machine. If your machine is heavily damaged, you may lose all data. For example; a bad PSU that shorts everything, including sufficiently many drives to circumvent the RAID redundancy.
- Copies on a different machine, on-site. Periodical synchronisation of files against a another file-system protects you from catastrophic failure of a single machine, whether or not you have many drives and redundancy in the same machine, but does not protect you against a site-wide catastrophic event, however unlikely, such as, well, I don’t know, flash flooding or fire.
- Copies on a remote media or machine off site. Periodical synchronisation of files against a removable media (something like, say, a USB hard drive that you keep off-site or that you bring from a secure location to exchange with the current copy. So you make a full (or incremental) backup one media then go to the secure location and exchange it for the second media, which you bring back for the next backup. In this way, you minimize the time both copies are at the same location. So, even if the site is nuked from orbit, the other copy should survive in the secure location.
You do realize that none of those techniques are absolutely bad and that they work better when you combine them. You may keep working copies in two locations on your file system (maybe using a revision control system), which is protected by RAID, which is periodically synchronized with a remote, but one-site computer for fast retrieval, which is in turn periodically backuped on the off-site medias should something really bad happen.
I must confess that I do most of the above. I do not have machines configured with RAID drives, but I do sync them nightly on a dedicated backup machine. I do make copies on removable drives. I do exchange them once in a while from a secure location. You may think it’s stupid, that I cannot possibly have data that’s worth that much trouble. Well, of course, I beg to differ. Not only do I have lots of source code, test data, and music, I also have a number of documents that I cannot get back from other sources. The LaTeX source code from my Ph. d. thesis and many other texts. The 30 000 or so pictures I shot over a period of several years. And countless other little things.
As our lives become increasingly dependent on technology—whether it’s our online presence on social networking sites or our use of digital photography to record our memories—it makes plenty of sense to make sure that none of this information is lost forever because of a faulty computer.
So, start training for your backup jutsu right now.