Robust Redundancy Scheme for the Repair Process: Hierarchical Codes in the Bandwidth-Limited Systems

Authors:
Zhen Huang;Yisong Lin;Yuxing Peng
Affiliations:
National Laboratory of Parallel and Distributed Processing, Department of Computer, National University of Defense Technology, Changsha, China 410073;Logistics Science and Research Institute, GLD, Beijing, China 100071;National Laboratory of Parallel and Distributed Processing, Department of Computer, National University of Defense Technology, Changsha, China 410073
Venue:
Journal of Grid Computing
Year:
2012

Citing 24
Cited 0

OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Erasure Coding Vs. Replication: A Quantitative Comparison

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
PAST: A Large-Scale, Persistent Peer-to-Peer Storage Utility

HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Erasure Code Replication Revisited

P2P '04 Proceedings of the Fourth International Conference on Peer-to-Peer Computing
Farsite: federated, available, and reliable storage for an incompletely trusted environment

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Erasure Codes for Increasing the Availability of Grid Data Storage

AICT-ICIW '06 Proceedings of the Advanced Int'l Conference on Telecommunications and Int'l Conference on Internet and Web Applications and Services
Internet-Scale Storage Systems under Churn -- A Study of the Steady-State using Markov Models

P2P '06 Proceedings of the Sixth IEEE International Conference on Peer-to-Peer Computing
A heterogeneous storage grid enabled by grid service

ACM SIGOPS Operating Systems Review
High availability, scalable storage, dynamic peer networks: pick two

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Total recall: system support for automated availability management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Glacier: highly durable, decentralized storage despite massive correlated failures

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Efficient replica maintenance for distributed storage systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
PeerStripe: a p2p-based large-file storage for desktop grids

Proceedings of the 16th international symposium on High performance distributed computing
Characterizing residential broadband networks

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Stochastic analysis of the interplay between object maintenance and churn

Computer Communications
A Practical Study of Regenerating Codes for Peer-to-Peer Backup Systems

ICDCS '09 Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems
DiskReduce: RAID for data-intensive scalable computing

Proceedings of the 4th Annual Workshop on Petascale Data Storage
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Nebulas: using distributed voluntary resources to build clouds

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Availability in globally distributed storage systems

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
A taxonomy of peer-to-peer desktop grid paradigms

Cluster Computing
High availability in DHTs: erasure coding vs. replication

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

High performance computing can be well supported by the Grid or cloud computing systems. However, these systems have to overcome the failure risks, where data is stored in the "unreliable" storage nodes that can leave the system at any moment and the nodes' network bandwidth is limited. In this case, the basic way to assure data reliability is to add redundancy using either replication or erasure codes. As compared to replication, erasure codes are more space efficient. Erasure codes break data into blocks, encode these blocks and distribute them into different storage nodes. When storage nodes permanently or temporarily abandon the system, new redundant blocks must be created to guarantee the data reliability, which is referred to as repair. Later when the churn nodes rejoin the system, the blocks stored in these nodes can reintegrate the data group to enhance the data reliability. For "classical" erasure codes, generating a new block requires to transmit a number of k blocks over the network, which brings lots of repair traffic, high computation complexity and high failure probability for the repair process. Then a near-optimal erasure code named Hierarchical Codes, has been proposed that can significantly reduce the repair traffic by reducing the number of nodes participating in the repair process, which is referred to as the repair degree d. To overcome the complexity of reintegration and provide an adaptive reliability for Hierarchical Codes, we refine two concepts called location and relocation, and then propose an integrated maintenance scheme for the repair process. Our experiments show that Hierarchical Code is the most robust redundancy scheme for the repair process as compared to other famous coding schemes.