Understanding Hard Disk Drive Failure

Hard disk drive

Internal View of a Hard Disk Drive

The most common and critical failure on any laptop, desktop or server computer involves hard disk drive failure. Since the programs and data on a computer require the hard drive to operate correctly, any problems with the hard disk drive must be dealt with through repair or replacement.

Before troubleshooting a hard drive failure, it is necessary to understand the basic operation of a typical hard drive.

Hard drives are small 3.5″ or 2.5″ sealed units inside every computer. Inside the hard disk drive is a motor that spins the metal discs at 5400rpm or 7200rpm. Each side of the spinning disc has an electro-magnetic read-write head attached to an arm that pivots across the disc.

When information is saved to a hard drive, the read-write head uses electricity to create a small magnetic field. This magnetic field is used to magnetize areas on a hard drive, essentially setting the magnetic field of the metal on the disc in a specific orientation. This magnetic orientation is interpreted as a ‘0’ or ‘1’ based on which way the magnetic field is pointing.

When the time comes to read the data back, the read-write head checks the magnetic orientation of the spot on the hard drive to read back the data. For the data to be read back correctly, everything has to work perfectly. This means that the areas that were magnetized when written, must stay magnetized without changing.

The fundamental problem with the design of all hard drives is that the regions that are magnetized are metal crystals. At the microsopic level, the metal crystal area on a hard drive is not a symmetric shape that magnetizes perfectly or consistently Instead, the metal crystal structure is irregular in shape, so not all areas magnetize with the same strength or consistency. This results in variation in the strength of the magnetization of the area on the hard drive.

Since the hard drive relies on magnetism to work, a lot of things can go wrong that create errors:

  • areas on the disc that fail to magnetize properly while being written become write failures.
  • areas on the disc that fail to read back clearly require multiple attempts, creating delays.
  • areas on the disc that fail completely to read are read failures and result in lost data.

If the read-write head becomes contaminated with microscopic bits of metal or dust, it may fail to read any part of the hard drive, resulting in drive failure. While the hard drive has a particulate filter inside to trap contaminants, the magnetic nature of the read-write head can attract very small metal particles that are dislodged from the disc. This metal dust will interfere with the read/write head and may also damage the disc platters.

The actual metal coating on the discs may be only 300 atoms thick. This is achieved using thin-film ion sputtering, where a powerful electron beam is used to spray chrome metal atoms onto an aluminum disc. The result is a uniformly thin and flat layer of metal that allows the read-write head to float on a cushion of air so thin that light is not visible between the gap. However, the close proximity of the read-write head, along with the thin layer of metal and powerful magnetic fields may sometimes rip layers of metal off the disc, contaminating the read-write head and damaging the disc.

Another type of failure involves motor wear. Whether a hard drive spindle motor uses ball bearings or fluid dynamic bearings, any shift in the bearing may drop the platters. As the platters shift, the read-write heads on one side of the disc are pushed closer while the other side moves away. This results in total disc failure. The solution to this problem involves placing a second spindle bearing on the opposite side of the spindle motor, which many disc drives lack.

If the pivot arm goes out of alignment from wear or failure, it will fail to read the disc. If the disc drive motor burns out, the disc will not spin. If the external circuit board malfunctions or fails, the hard drive will fail to be recognized on startup.

Since hard drives are subject to malfunction in so many different ways, they include very sophisticated and effective error correction. The error correction is additional information that is saved along with the data, and this extra information is used to check the accuracy of the data when it is saved and written.

Small errors can be resolved instantly by relying the error correction information. However, above a certain threshold, even the error correction cannot correct the data. This is when delays or failure become evident. These failures are not caused by software or viruses. Instead, they are caused either by failures of the hard disc components, or by external shock that disrupts the drive while running.

Data recovery on a failing hard drive involves either making the error correction on the hard drive perform more attempts, or using special software to bypass the error correction on the hard drive. Either approach can require long periods of time, ranging from 1-10 days for every sector on the hard drive to be read or re-read up to one thousand times to recover the data.

For example, on an 80gb hard drive, there are 160 million 512-byte sectors that must be read to recover the entire hard drive. For each bad sector, the diagnostic software may re-read that sector up to 1,000 times to reconstruct the data.

When data recovery cannot be performed using the software method described above, it may still be possible to recover the data using a national data recovery service such as OnTrack Data Systems in Minnesota. Their data recovery capabilities exceed anything else available, since they will disassemble the hard drive in a clean room and use a servo-writer to read information from the discs.

This approach bypasses all of the failed components, and is very effective at reading and recovering data from a failed hard drive. However, the cost of outside data recovery starts at $100 and can approach $1600, depending upon the size of the drive and severity of damage.

Every hard drive includes an error tracking feature known as “SMART” — short for Self Monitoring Analysis and Reporting Technology. The SMART information is a lifetime log of ten or more different categories of errors, including the power-on time for the hard drive. This information can be read at any time using a variety of software programs that are specifically designed to display the SMART history stored on the hard drive. The SMART information cannot be changed or edited.

Using the SMART information, a history of errors can be viewed and used to assess a drive for failure. Typically, hard drives are considered failing if they have relocated or re-allocated sectors, since this indicates a bad sector on the hard drive that has been removed from use. Some hard drive differentiate between relocated and uncorrectable sectors, with uncorrectable sectors posing a greater risk to data.

This entry was posted in Computers and tagged , , , , . Bookmark the permalink.

Leave a Reply