In-network redundancy generation for opportunistic speedup of data backup

Authors:
Lluis Pamies-Juarez;Anwitaman Datta;FréDéRique Oggier
Affiliations:
-;-;-
Venue:
Future Generation Computer Systems
Year:
2013

Citing 15
Cited 0

OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Erasure Coding Vs. Replication: A Quantitative Comparison

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Erasure Code Replication Revisited

P2P '04 Proceedings of the Fourth International Conference on Peer-to-Peer Computing
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
A global view of kad

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Hierarchical Codes: How to Make Erasure Codes Attractive for Peer-to-Peer Storage Systems

P2P '08 Proceedings of the 2008 Eighth International Conference on Peer-to-Peer Computing
DiskReduce: RAID for data-intensive scalable computing

Proceedings of the 4th Annual Workshop on Petascale Data Storage
Sector: A high performance wide area community data storage and sharing system

Future Generation Computer Systems
Availability in globally distributed storage systems

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
CineGrid Exchange: A workflow-based peta-scale distributed storage platform on a high-speed network

Future Generation Computer Systems
Windows Azure Storage: a highly available cloud storage service with strong consistency

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
High availability in DHTs: erasure coding vs. replication

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
Erasure coding in windows azure storage

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Redundantly grouped cross-object coding for repairable storage

Proceedings of the Asia-Pacific Workshop on Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Erasure coding is a storage-efficient alternative to replication for achieving reliable data backup in distributed storage systems. During the storage process, traditional erasure codes require a unique source node to create and upload all the redundant data to the different storage nodes. However, such a source node may have limited communication and computation capabilities, which constrain the storage process throughput. Moreover, the source node and the different storage nodes might not be able to send and receive data simultaneously-e.g., nodes might be busy in a data center setting, or simply be offline in a peer-to-peer setting-which can further threaten the efficacy of the overall storage process. In this paper, we propose an ''in-network'' redundancy generation process which distributes the data insertion load among the source and storage nodes by allowing the storage nodes to generate new redundant data by exchanging partial information among themselves, improving the throughput of the storage process. The process is carried out asynchronously, utilizing spare bandwidth and computing resources from the storage nodes. The proposed approach leverages on the local repairability property of newly proposed erasure codes, tailor made for the needs of distributed storage systems. We analytically show that, the performance of this technique relies on an efficient usage of the spare node resources, and we derive a set of scheduling algorithms to maximize the same. We experimentally show, using availability traces from real peer-to-peer applications as well as Google data center availability and workload traces, that, our algorithms can, depending on the environmental characteristics, increase the throughput of the storage process significantly (up to 90% in data centers, and 60% in peer-to-peer settings) with respect to the classical naive data insertion approach.