A case for redundant arrays of inexpensive disks (RAID)
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
RAID: high-performance, reliable secondary storage
ACM Computing Surveys (CSUR)
Erasure Coding Vs. Replication: A Quantitative Comparison
IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Evaluation of Distributed Recovery in Large-Scale Storage Systems
HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Mirrored Disk Organization Reliability Analysis
IEEE Transactions on Computers
Higher reliability redundant disk arrays: Organization, operation, and coding
ACM Transactions on Storage (TOS)
Availability in globally distributed storage systems
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Reliability analysis of deduplicated and erasure-coded storage
ACM SIGMETRICS Performance Evaluation Review
Reliability of Clustered vs. Declustered Replica Placement in Data Storage Systems
MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
Reliability of Data Storage Systems under Network Rebuild Bandwidth Constraints
MASCOTS '12 Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
A General Reliability Model for Data Storage Systems
QEST '12 Proceedings of the 2012 Ninth International Conference on Quantitative Evaluation of Systems
Hi-index | 0.00 |
Modern data storage systems employ advanced erasure codes to protect data from storage node failures because of their ability to provide high data reliability at high storage efficiency. In contrast to previous studies, we consider the practical case where the length of codewords in an erasure coded system is much smaller than the number of storage nodes in the system. In this case, there exists a large number of possible ways in which different codewords can be stored across the nodes of the system. In this paper, it is shown that a declustered placement of codewords can significantly improve system reliability compared to other placement schemes. A detailed reliability analysis is presented that accounts for the rebuild times involved, the amounts of partially rebuilt data when additional nodes fail during rebuild, and an intelligent rebuild process that attempts to rebuild the most critical codewords first.