A Highly Accurate Method for Assessing Reliability of Redundant Arrays of Inexpensive Disks (RAID)

  • Authors:
  • Jon Elerath;Michael Pecht

  • Affiliations:
  • Network Appliance, Sunnyvale;University of Maryland, College Park

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 2009

Quantified Score

Hi-index 14.98

Visualization

Abstract

Abstract - The statistical bases for current models of RAID reliability are reviewed and a highly accurate alternative is provided and justified. This new model corrects statistical errors associated with the pervasive assumption that system (RAID group) times to failure follow a homogeneous Poisson process, and corrects errors associated with assuming the time-to-failure and time-to-restore distributions are exponentially distributed. Statistical justification for the new model uses theory for reliability of repairable systems. Four critical component distributions are developed from field data. These distributions are for times to catastrophic failure, reconstruction and restoration, read errors, and disk data scrubs. Model results have been verified and predict between 2 to 1,500 times as many double disk failures as estimates made using the mean time to data loss method. Model results are compared to system level field data for RAID group of 14 drives and show excellent correlation and greater accuracy than either MTTDL.