Analysis of a new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors

Authors:
Ajay Dholakia;Evangelos Eleftheriou;Xiao--Yu Hu;Ilias Iliadis;Jai Menon;KK Rao
Affiliations:
IBM Syst. and Tech. Group, Research Triangle Park, NC;IBM Zurich Research Lab, Rüschlikon, Switzerland;IBM Zurich Research Lab, Rüschlikon, Switzerland;IBM Zurich Research Lab, Rüschlikon, Switzerland;IBM Syst. and Tech. Group, San Jose, CA;IBM Almaden Research Center, San Jose, CA
Venue:
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Year:
2006

Citing 0
Cited 5

Pergamum: replacing tape with energy efficient, reliable, disk-based archival storage

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Disk drive workload captured in logs collected during the field return incoming test

WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
Disk Scrubbing Versus Intradisk Redundancy for RAID Storage Systems

ACM Transactions on Storage (TOS)
Beyond MTTDL: A Closed-Form RAID 6 Reliability Equation

ACM Transactions on Storage (TOS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's data storage systems are increasingly adopting low-cost disk drives that have higher capacity but lower reliability, leading to more frequent rebuilds and to a higher risk of unrecoverable media errors. We propose a new XOR-based intra-disk redundancy scheme, called interleaved parity check (IPC), to enhance the reliability of RAID systems that incurs only negligible I/O performance degradation. The proposed scheme introduces an additional level of redundancy inside each disk, on top of the RAID redundancy across multiple disks. The RAID parity provides protection against disk failures, while the proposed scheme aims to protect against media-related unrecoverable errors.We develop a new model capturing the effect of correlated unrecoverable sector errors and subsequently use it to analyze the proposed scheme as well as the traditional redundancy schemes based on Reed-Solomon (RS) codes and single-parity-check (SPC) codes. We derive closed-form expressions for the mean time to data loss (MTTDL) of RAID 5 and RAID 6 systems in the presence of unrecoverable errors and disk failures. We then combine these results for a comprehensive characterization of the reliability of RAID systems that incorporate the proposed IPC redundancy scheme. Our results show that in the practical case of correlated errors, the proposed scheme provides the same reliability as the optimum albeit more complex RS coding scheme. Finally, the throughput performance of incorporating the intra-disk redundancy on various RAID systems is evaluated by means of event-driven simulations. A detailed description of these contributions is given in [1].