XORing elephants: novel erasure codes for big data

Authors:
Maheswaran Sathiamoorthy;Megasthenis Asteris;Dimitris Papailiopoulos;Alexandros G. Dimakis;Ramkumar Vadali;Scott Chen;Dhruba Borthakur
Affiliations:
University of Southern, California;University of Southern, California;University of Texas at Austin;University of Texas at Austin;Facebook;Facebook;Facebook
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 18
Cited 4

Reed-Solomon Codes and Their Applications

Reed-Solomon Codes and Their Applications
Reliability Mechanisms for Very Large Storage Systems

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Dcell: a scalable and fault-tolerant network structure for data centers

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Write off-loading: Practical power management for enterprise storage

ACM Transactions on Storage (TOS)
The cost of a cloud: research problems in data center networks

ACM SIGCOMM Computer Communication Review
VL2: a scalable and flexible data center network

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
DiskReduce: RAID for data-intensive scalable computing

Proceedings of the 4th Annual Workshop on Petascale Data Storage
Network coding for distributed storage systems

IEEE Transactions on Information Theory
Mean time to meaningless: MTTDL, Markov models, and storage system reliability

HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
Availability in globally distributed storage systems

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
In search of I/O-optimal recovery from disk failures

HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Managing data transfers in computer clusters with orchestra

Proceedings of the ACM SIGCOMM 2011 conference
Windows Azure Storage: a highly available cloud storage service with strong consistency

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
High availability in DHTs: erasure coding vs. replication

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Polynomial time algorithms for multicast network code construction

IEEE Transactions on Information Theory
A Random Linear Network Coding Approach to Multicast

IEEE Transactions on Information Theory
Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction

IEEE Transactions on Information Theory

Leveraging endpoint flexibility in data-intensive clusters

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the Facebook warehouse cluster

HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
STAIR codes: a general family of erasure codes for tolerating device and sector failures in practical storage systems

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Parity logging with reserved space: towards efficient updates and recovery in erasure-coded clustered storage

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability. This paper shows how to overcome this limitation. We present a novel family of erasure codes that are efficiently repairable and offer higher reliability compared to Reed-Solomon codes. We show analytically that our codes are optimal on a recently identified tradeoff between locality and minimum distance. We implement our new codes in Hadoop HDFS and compare to a currently deployed HDFS module that uses Reed-Solomon codes. Our modified HDFS implementation shows a reduction of approximately 2× on the repair disk I/O and repair network traffic. The disadvantage of the new coding scheme is that it requires 14% more storage compared to Reed-Solomon codes, an overhead shown to be information theoretically optimal to obtain locality. Because the new codes repair failures faster, this provides higher reliability, which is orders of magnitude higher compared to replication.