Beyond MTTDL: A Closed-Form RAID 6 Reliability Equation

  • Authors:
  • Jon G. Elerath;Jiri Schindler

  • Affiliations:
  • Reliability Consulting Services;NetApp

  • Venue:
  • ACM Transactions on Storage (TOS)
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a new closed-form equation for estimating the number of data-loss events for a redundant array of inexpensive disks in a RAID-6 configuration. The equation expresses operational failures, their restorations, latent (sector) defects, and disk media scrubbing by time-based distributions that can represent non-homogeneous Poisson processes. It uses two-parameter Weibull distributions that allows the distributions to take on many different shapes, modeling increasing, decreasing, or constant occurrence rates. This article focuses on the statistical basis of the equation. It also presents time-based distributions of the four processes based on an extensive analysis of field data collected over several years from 10,000s of commercially available systems with 100,000s of disk drives. Our results for RAID-6 groups of size 16 indicate that the closed-form expression yields much more accurate results compared to the MTTDL reliability equation and matching computationally-intensive Monte Carlo simulations.