Cooperative recovery of distributed storage systems from multiple losses with network coding

Authors:
Yuchong Hu;Yinlong Xu;Xiaozhao Wang;Cheng Zhan;Pei Li
Affiliations:
School of Computer Science & Technology, University of Science & Technology of China and Key Laboratory on High Performance Computing, Anhui Province;School of Computer Science & Technology, University of Science & Technology of China and Key Laboratory on High Performance Computing, Anhui Province;School of Computer Science & Technology, University of Science & Technology of China and Key Laboratory on High Performance Computing, Anhui Province;School of Computer Science & Technology, University of Science & Technology of China and Key Laboratory on High Performance Computing, Anhui Province;School of Computer Science & Technology, University of Science & Technology of China and Key Laboratory on High Performance Computing, Anhui Province
Venue:
IEEE Journal on Selected Areas in Communications
Year:
2010

Citing 9
Cited 2

Efficient dispersal of information for security, load balancing, and fault tolerance

Journal of the ACM (JACM)
Randomized algorithms

Randomized algorithms
OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Erasure Coding Vs. Replication: A Quantitative Comparison

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Farsite: federated, available, and reliable storage for an incompletely trusted environment

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Total recall: system support for automated availability management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
High availability in DHTs: erasure coding vs. replication

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems

A deterministic algorithm of single failed node recovery in MSR-based distributed storage systems

ACM SIGMETRICS Performance Evaluation Review
NCCloud: applying network coding for the storage repair in a cloud-of-clouds

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.07

Visualization

Abstract

This paper studies the recovery from multiple node failures in distributed storage systems. We design a mutually cooperative recovery (MCR) mechanism for multiple node failures. Via a cut-based analysis of the information flow graph, we obtain a lower bound of maintenance bandwidth based on MCR. For MCR, we also propose a transmission scheme and design a linear network coding scheme based on (n, k) strong-MDS code, which is a generalization of (n, k) MDS code. We prove that the maintenance bandwidth based on our transmission and coding schemes matches the lower bound, so the lower bound is tight and the transmission scheme and coding scheme for MCR are optimal. We also give numerical comparisons of MCR with other redundancy recovery mechanisms in storage cost and maintenance bandwidth to show the advantage of MCR.