Efficient dispersal of information for security, load balancing, and fault tolerance
Journal of the ACM (JACM)
Randomized algorithms
OceanStore: an architecture for global-scale persistent storage
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Erasure Coding Vs. Replication: A Quantitative Comparison
IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Farsite: federated, available, and reliable storage for an incompletely trusted environment
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Total recall: system support for automated availability management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
High availability in DHTs: erasure coding vs. replication
IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
A deterministic algorithm of single failed node recovery in MSR-based distributed storage systems
ACM SIGMETRICS Performance Evaluation Review
NCCloud: applying network coding for the storage repair in a cloud-of-clouds
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Hi-index | 0.07 |
This paper studies the recovery from multiple node failures in distributed storage systems. We design a mutually cooperative recovery (MCR) mechanism for multiple node failures. Via a cut-based analysis of the information flow graph, we obtain a lower bound of maintenance bandwidth based on MCR. For MCR, we also propose a transmission scheme and design a linear network coding scheme based on (n, k) strong-MDS code, which is a generalization of (n, k) MDS code. We prove that the maintenance bandwidth based on our transmission and coding schemes matches the lower bound, so the lower bound is tight and the transmission scheme and coding scheme for MCR are optimal. We also give numerical comparisons of MCR with other redundancy recovery mechanisms in storage cost and maintenance bandwidth to show the advantage of MCR.