From some posts to the
Charlotte [North Carolina, U.S.A.] Linux User
Group 'discuss'
mailing list comes a question whether a listmember's hard drive
is failing and a helpful suggestion about determing and monitoring
the hard drive's health. Briefly, one of the listmember's log files
contained
Aug 2 17:01:44 tc kernel: hda: irq timeout:
status=0xd0 { Busy }
Aug 2 17:01:44 tc kernel: Aug 2 17:01:44 tc kernel: ide0: reset
timed-out, status=0xd0
Aug 2 17:01:44 tc kernel: hda: status timeout: status=0xd0 { Busy
}
Aug 2 17:01:44 tc kernel: Aug 2 17:01:44 tc kernel: hda: drive not
ready for command
Aug 2 17:01:44 tc kernel: ide0: reset: success
While warning the listmember to copy data from /dev/hda, another
listmember suggested installing
smartmontools, which in
Debian is a simple `apt-get install smartmontools` away. The
smartmontools page includes installation instructions for .rpm
based Linux distros and other platforms.
I installed smartmontools, and read the manpages for smartd,
smartd.conf and smartctl.
I needed to edit /etc/smartd.conf and /etc/default/smartmontools.
Specifically, I commented out DEVICESCAN in /etc/smartd.conf and
uncomment the /dev/hda line in that file, and I also chose to
append '-m [my.email.account@my.domain]' to the /dev/hda line. I
also edited /etc/default/smartmontools so that lines now read
'enable_smart="/dev/hda"' and 'start_smartd=yes'.
After installing smartmontools, drives' current status can be
reported by doing `smartctl -A /dev/hdx' as root, where 'x' is the
letter assigned to the drive. Output includes VALUE, WORST, and
THRESH columns.
When I first read the output on my desktop, I was concerned because
VALUE and WORST were greater than THRESH, but when I carefully
reread the post to discuss@charlug.org, I saw that when VALUE and
WORST are
less than THRESH, failure is imminent, not when
VALUE and WORST are
greater than THRESH.