Being S.M.A.R.T. with Drives

As I mentioned before (here and here), you really can’t trust your hardware to maintain a good health all by itself. It can overheat because of bad case design, dirty fans, or it can just burn out because of a bad PSU. It can also die from old age, which can mean any kind of weird symptoms, from random freezes to programs that crash all the time. You can test bad RAM using the free Memtest86+ which is conveniently packaged with Ubuntu’s live CD, and you can test your drives using their built-in SMART capabilities.

SMART (or S.M.A.R.T) stands for Self-Monitoring, Analysis, and Reporting Technology, and it’s basically extra sensors and firmware added to your hard disks so that they can detect hardware failures and other conditions, such as the drive’s temperature. The tool of choice on Linux to access SMART status is Smartmontools, which turned out to be most useful.

The package is installed on Ubuntu (or any other Debian-based distro) by invoking sudo apt-get install smartmontool. You may also want to install the postfix package at the same time as smartmontool will use the Postfix mail system to warn you if something goes wrong should you enable the SMART dæmon. The SMART dæmon will run periodic tests and monitor your drives’ health and send you an automated mail should something bad happens. Saved my life once. Well, saved my data, anyway.

Invoking sudo smartctl -a /dev/adrive will print the current SMART status of your drive (with adrive being something like sda or whatever it is the drive you’re interested in is. The first piece of interesting information is the drive’s identification. On one of my drive, it reads:

Serial Number:    WD-XXXXXXXXXXXX
Firmware Version: 01.00A01
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Oct 10 19:34:44 2010 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Should SMART support be disabled, enable it in your machine’s BIOS if possible. It may also be that the drive is not SMART-capable, which is unlikely if the drive is somewhat recent.

It also shows you all the performance/statistics registers gathered by your drive. On the same drive, sudo smartctl -a /dev/adrive

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   110   110   021    Pre-fail  Always       -       7458
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       9
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       6877
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       8
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       13
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       961199
194 Temperature_Celsius     0x0022   118   111   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

So this drive’s in good health. It is at a happy 29°C, and it reports no errors of any kind. Should errors had occurred, they would show in this report. But it is sometimes wise to run a long low-level test of the entire drive. Invoking

sudo smartclt -t long /dev/sda

will launch a long (full) test on your drive. Smartmontool will tell you how long it will be (likely a few hours for a 1TB drive). The good thing is that the test is taken care of by the firmware of the drive, and you can continue working normally. If your drive is healthy, you get (after a while) a report (invoking smartctl -a /dev/adrive) such as:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%        17         -

otherwise you’d get something like:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read error         70%        17         19137113

and it indicates where the error occurred.

My advice is to get all the data you can from this drive because while read errors do not mean imminent failure, they’re not a good sign. Better be safe than sorry.

2 Responses to Being S.M.A.R.T. with Drives

  1. A P Geofrey says:

    Great post. I have a computer that suffers from one or more of those symptoms you listed above. Does that mean that I should try one of the methods you proposed?

    • Steven Pigeon says:

      Yes. I would first run memtest86+ to make sure the RAM is ok as well, then have the drive self-test. To run memtest86+, you just create a boot floppy/cd/dvd/usb drive and boot with it. Let it run for a few hours, at least one full pass. Random symptoms may be the cause of bad memory.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: