Google did a study on hard drives several years ago as reported at Google’s Disk Failure Experience
Just part of the article discussing Google's study:
Google released a fascinating research paper titled Failure Trends in a Large Disk Drive Population (pdf) at this years File and Storage Technologies (FAST ’07) conference. Google collected data on a population of 100,000 disk drives, analyzed it, and wrote it up for our delectation.
Sudden heat death?
One of the most intriguing findings is the relationship between drive temperature and drive mortality. The Google team took temperature readings from SMART records every few minutes for the nine-month period. As the figure here shows, failure rates do not increase when the average temperature increases. At very high temperatures there is a negative effect, but even that is slight. Here’s the graph from the paper:
Drive age has an effect, but again, only at very high temperatures. Here’s that graph:
The Googlers conclude:
In the lower and middle temperature ranges, higher temperatures are not associated with higher failure rates. This is a fairly surprising result, which could indicate that data center or server designers have more freedom than previously thought when setting operating temperatures for equipment that contains disk drives.
Good news for internet data center managers.
The StorageMojo take
There is a lot here and the implications may surprise.
- Disk MTBF numbers significantly understate failure rates. If you plan on AFRs that are 50% higher than MTBFs suggest, you’ll be better prepared.
- For us SOHO users, consider replacing 3 year old disks, or at least get serious about back up.
- Enterprise disk purchasers should demand real data to back up the claimed MTBFs – typically 1 million hours plus – for those costly and now much less studied drives.
- SMART will alert you to some issues, but not most, so the industry should get cracking and come up with something more useful.
- Workload numbers call into question the utility of architectures, like MAID, that rely on turning off disks to extend life. The Googlers didn’t study that application, but if I were marketing MAID I’d get ready for some hard questions.
- Folks who plan and sell cooling should also get ready for tough questions. Maybe cooler isn’t always better. But it sure is a lot more expensive.
- This validates the use of “consumer” drives in data centers because for the first time we have a large-scale population study that we’ve never seen for enterprise drives.