Effect of codeword placement on the reliability of erasure coded data storage systems

Authors:
Vinodh Venkatesan;Ilias Iliadis
Affiliations:
IBM Research --- Zurich, Rüschlikon, Switzerland;IBM Research --- Zurich, Rüschlikon, Switzerland
Venue:
QEST'13 Proceedings of the 10th international conference on Quantitative Evaluation of Systems
Year:
2013

Citing 11
Cited 0

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
Erasure Coding Vs. Replication: A Quantitative Comparison

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Evaluation of Distributed Recovery in Large-Scale Storage Systems

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Mirrored Disk Organization Reliability Analysis

IEEE Transactions on Computers
Higher reliability redundant disk arrays: Organization, operation, and coding

ACM Transactions on Storage (TOS)
Availability in globally distributed storage systems

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Reliability analysis of deduplicated and erasure-coded storage

ACM SIGMETRICS Performance Evaluation Review
Reliability of Clustered vs. Declustered Replica Placement in Data Storage Systems

MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
Reliability of Data Storage Systems under Network Rebuild Bandwidth Constraints

MASCOTS '12 Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
A General Reliability Model for Data Storage Systems

QEST '12 Proceedings of the 2012 Ninth International Conference on Quantitative Evaluation of Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern data storage systems employ advanced erasure codes to protect data from storage node failures because of their ability to provide high data reliability at high storage efficiency. In contrast to previous studies, we consider the practical case where the length of codewords in an erasure coded system is much smaller than the number of storage nodes in the system. In this case, there exists a large number of possible ways in which different codewords can be stored across the nodes of the system. In this paper, it is shown that a declustered placement of codewords can significantly improve system reliability compared to other placement schemes. A detailed reliability analysis is presented that accounts for the rebuild times involved, the amounts of partially rebuilt data when additional nodes fail during rebuild, and an intelligent rebuild process that attempts to rebuild the most critical codewords first.