Beyond MTTDL: A Closed-Form RAID 6 Reliability Equation

Authors:
Jon G. Elerath;Jiri Schindler
Affiliations:
Reliability Consulting Services;NetApp
Venue:
ACM Transactions on Storage (TOS)
Year:
2014

Citing 18
Cited 0

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Designing disk arrays for high data reliability

Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems
Reliability analysis of redundant arrays of inexpensive disks

Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems
EVENODD: an optimal scheme for tolerating double disk failures in RAID architectures

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Recurrent Events Data Analysis for Product Repairs, Disease Recurrences, and Other Applications

Recurrent Events Data Analysis for Product Repairs, Disease Recurrences, and Other Applications
Reliability for Networked Storage Nodes

DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Analysis of a new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
An analysis of latent sector errors in disk drives

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Failure trends in a large disk drive population

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Hard Disk Drives: The Good, the Bad and the Ugly!

Queue - File Systems and Storage
An analysis of data corruption in the storage stack

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
A Highly Accurate Method for Assessing Reliability of Redundant Arrays of Inexpensive Disks (RAID)

IEEE Transactions on Computers
Higher reliability redundant disk arrays: Organization, operation, and coding

ACM Transactions on Storage (TOS)
Evaluating the Impact of Irrecoverable Read Errors on Disk Array Reliability

PRDC '09 Proceedings of the 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing
Mean time to meaningless: MTTDL, Markov models, and storage system reliability

HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
Reliability Analysis of Declustered-Parity RAID 6 with Disk Scrubbing and Considering Irrecoverable Read Errors

NAS '10 Proceedings of the 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage
Row-diagonal parity for double disk failure correction

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a new closed-form equation for estimating the number of data-loss events for a redundant array of inexpensive disks in a RAID-6 configuration. The equation expresses operational failures, their restorations, latent (sector) defects, and disk media scrubbing by time-based distributions that can represent non-homogeneous Poisson processes. It uses two-parameter Weibull distributions that allows the distributions to take on many different shapes, modeling increasing, decreasing, or constant occurrence rates. This article focuses on the statistical basis of the equation. It also presents time-based distributions of the four processes based on an extensive analysis of field data collected over several years from 10,000s of commercially available systems with 100,000s of disk drives. Our results for RAID-6 groups of size 16 indicate that the closed-form expression yields much more accurate results compared to the MTTDL reliability equation and matching computationally-intensive Monte Carlo simulations.