Online availability upgrades for parity-based RAIDs through supplementary parity augmentations

Authors:
Lei Tian;Qiang Cao;Hong Jiang;Dan Feng;Changsheng Xie;Qin Xin
Affiliations:
Huazhong University of Science, China and Technology/University of Nebraska-Lincoln, Lincoln, NE;-;University of Nebraska-Lincoln, Lincoln, NE;Huazhong University of Science and Technology, China;Huazhong University of Science and Technology, China;-
Venue:
ACM Transactions on Storage (TOS)
Year:
2011

Citing 32
Cited 2

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
The Performance of Parity Placements in Disk Arrays

IEEE Transactions on Computers
EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures

IEEE Transactions on Computers - Special issue on fault-tolerant computing
A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems

Software—Practice & Experience
Automatic Recovery from Disk Failure in Continuous-Media Servers

IEEE Transactions on Parallel and Distributed Systems
Disk Scrubbing in Large Archival Storage Systems

MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Energy conservation in heterogeneous server clusters

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Awarded Best Student Paper! -- Improving Storage System Availability with D-GRAID

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
A design for high-performance flash disks

ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
Enhanced Reliability Modeling of RAID Storage Systems

DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
An analysis of latent sector errors in disk drives

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Idleness is not sloth

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
A comparison of file system workloads

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Failure trends in a large disk drive population

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
PARAID: a gear-shifting power-aware RAID

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
PRO: a popularity-based multi-threaded reconstruction optimization for RAID-structured storage systems

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
AFRAID: a frequently redundant array of independent disks

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Exploiting Platform Heterogeneity for Power Efficient Data Centers

ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors

ACM Transactions on Storage (TOS)
The RAID-6 liberation codes

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Are disks the dominant contributor for storage failures?: a comprehensive study of storage subsystem failure characteristics

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Write off-loading: practical power management for enterprise storage

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Idle read after write: IRAW

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Design tradeoffs for SSD performance

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
MICRO: A Multilevel Caching-Based Reconstruction Optimization for Mobile Storage Systems

IEEE Transactions on Computers
WorkOut: I/O workload outsourcing for boosting RAID reconstruction performance

FAST '09 Proccedings of the 7th conference on File and storage technologies
Restrained utilization of idleness for transparent scheduling of background tasks

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Collaboration-Oriented Data Recovery for Mobile Disk Arrays

ICDCS '09 Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems
Differential RAID: rethinking RAID for SSD reliability

Proceedings of the 5th European conference on Computer systems

Rebuild processing in RAID5 with emphasis on the supplementary parity augmentation method[37]

ACM SIGARCH Computer Architecture News
IDO: intelligent data outsourcing with improved RAID reconstruction performance in large-scale data centers

lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we propose a simple but powerful online availability upgrade mechanism, Supplementary Parity Augmentations(SPA), to address the availability issue in parity-based RAID systems. The basic idea of SPA is to store and update the supplementary parity units on one or a few newly augmented spare disks for online RAID systems in the operational mode, thus achieving the goals of improving the reconstruction performance while tolerating multiple disk failures and latent sector errors simultaneously. By applying the exclusive OR operations appropriately among supplementary parity, full parity, and data units, SPA can reconstruct the data on the failed disks with a fraction of the original overhead that is proportional to the supplementary parity coverage, thus significantly reducing the overhead of data regeneration and decreasing recovery time in parity-based RAID systems. Our extensive trace-driven simulation study shows that SPA can significantly improve the reconstruction performance of the RAID5 and RAID5+0 systems, at an acceptable performance overhead imposed in the operational mode. Moreover, our reliability analytical modeling and sequential Monte-Carlo simulation demonstrate that SPA is consistently more than double the MTTDL of the RAID5 system and improves the reliability of the RAID5+0 system noticeably.