Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads

Authors:
Osama Khan;Randal Burns;James Plank;William Pierce;Cheng Huang
Affiliations:
Department of Computer Science, Johns Hopkins University;Department of Computer Science, Johns Hopkins University;Dept. of Electrical Engineering and Computer Science, University of Tennessee;Dept. of Electrical Engineering and Computer Science, University of Tennessee;Microsoft Research
Venue:
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Year:
2012

Citing 34
Cited 10

Parity declustering for continuous operation in redundant disk arrays

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
RAID-II: a high-bandwidth network file server

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Practical loss-resilient codes

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Effective erasure codes for reliable computer communication protocols

ACM SIGCOMM Computer Communication Review
A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems

Software—Practice & Experience
Lessons from Giant-Scale Services

IEEE Internet Computing
Erasure Coding Vs. Replication: A Quantitative Comparison

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Note: Correction to the 1997 tutorial on Reed–Solomon coding

Software—Practice & Experience - Research Articles
Matrix methods for lost data reconstruction in erasure codes

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
WEAVER codes: highly fault tolerant erasure codes for storage systems

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
REO: a generic RAID engine and optimizer

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
PRO: a popularity-based multi-threaded reconstruction optimization for RAID-structured storage systems

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
The RAID-6 liberation codes

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
STAR: An Efficient Coding Scheme for Correcting Triple Storage Node Failures

IEEE Transactions on Computers
A performance evaluation and examination of open-source erasure coding libraries for storage

FAST '09 Proccedings of the 7th conference on File and storage technologies
The Raid-6 Liber8Tion Code

International Journal of High Performance Computing Applications
DiskReduce: RAID for data-intensive scalable computing

Proceedings of the 4th Annual Workshop on Petascale Data Storage
Explicit construction of optimal exact regenerating codes for distributed storage

Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
Data warehousing and analytics infrastructure at facebook

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Optimal recovery of single disk failure in RDP code storage systems

Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Network coding for distributed storage systems

IEEE Transactions on Information Theory
Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Availability in globally distributed storage systems

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Row-diagonal parity for double disk failure correction

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Improving storage system availability with D-GRAID

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
In search of I/O-optimal recovery from disk failures

HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Windows Azure Storage: a highly available cloud storage service with strong consistency

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
High availability in DHTs: erasure coding vs. replication

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
MDS array codes with independent parity symbols

IEEE Transactions on Information Theory
On lowest density MDS codes

IEEE Transactions on Information Theory
X-code: MDS array codes with optimal encoding

IEEE Transactions on Information Theory

Erasure coding in windows azure storage

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems

ACM Transactions on Storage (TOS)
XORing elephants: novel erasure codes for big data

Proceedings of the VLDB Endowment
Regenerating codes: a system perspective

ACM SIGOPS Operating Systems Review
High performance & low latency in solid-state drives through redundancy

Proceedings of the 1st Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads
A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the Facebook warehouse cluster

HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Sector-Disk (SD) Erasure Codes for Mixed Failure Modes in RAID Systems

ACM Transactions on Storage (TOS)
SD codes: erasure codes designed for how storage systems really fail

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Screaming fast Galois field arithmetic using intel SIMD instructions

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Parity logging with reserved space: towards efficient updates and recovery in erasure-coded clustered storage

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

To reduce storage overhead, cloud file systems are transitioning from replication to erasure codes. This process has revealed new dimensions on which to evaluate the performance of different coding schemes: the amount of data used in recovery and when performing degraded reads. We present an algorithm that finds the optimal number of codeword symbols needed for recovery for any XOR-based erasure code and produces recovery schedules that use a minimum amount of data. We differentiate popular erasure codes based on this criterion and demonstrate that the differences improve I/O performance in practice for the large block sizes used in cloud file systems. Several cloud systems [15, 10] have adopted Reed-Solomon (RS) codes, because of their generality and their ability to tolerate larger numbers of failures. We define a new class of rotated Reed-Solomon codes that perform degraded reads more efficiently than all known codes, but otherwise inherit the reliability and performance properties of Reed-Solomon codes.