Network coding for distributed storage systems

Authors:
Alexandros G. Dimakis;P. Brighten Godfrey;Yunnan Wu;Martin J. Wainwright;Kannan Ramchandran
Affiliations:
Department of Electrical Engineering-Systems, University of Southern California, Los Angeles, CA;Computer Science Department, University of Illinois at Urbana-Champaign, Urbana, IL;Microsoft Research, Redmond, WA;Department of Statistics, University of California, Berkeley, CA;Wireless Foundations, Department of EECS, University of California, Berkeley, CA
Venue:
IEEE Transactions on Information Theory
Year:
2010

Citing 28
Cited 23

EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Maintenance-Free Global Data Storage

IEEE Internet Computing
LT Codes

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Erasure Coding Vs. Replication: A Quantitative Comparison

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Polynomial time algorithms for network information flow

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
An algebraic approach to network coding

IEEE/ACM Transactions on Networking (TON)
Awarded Best Student Paper! - Pond: The OceanStore Prototype

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Ubiquitous access to distributed data in large-scale sensor networks through decentralized erasure codes

IPSN '05 Proceedings of the 4th international symposium on Information processing in sensor networks
Raptor codes

IEEE/ACM Transactions on Networking (TON) - Special issue on networking and information theory
Decentralized erasure codes for distributed networked storage

IEEE/ACM Transactions on Networking (TON) - Special issue on networking and information theory
Minimizing churn in distributed systems

Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
Growth codes: maximizing sensor network data persistence

Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
STAR: an efficient coding scheme for correcting triple storage node failures

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
WEAVER codes: highly fault tolerant erasure codes for storage systems

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Designing a DHT for low latency and high throughput

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Total recall: system support for automated availability management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Efficient replica maintenance for distributed storage systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Digraphs: Theory, Algorithms and Applications

Digraphs: Theory, Algorithms and Applications
High availability in DHTs: erasure coding vs. replication

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
X-code: MDS array codes with optimal encoding

IEEE Transactions on Information Theory
Network information flow

IEEE Transactions on Information Theory
Improved low-density parity-check codes using irregular graphs

IEEE Transactions on Information Theory
Linear network coding

IEEE Transactions on Information Theory
Polynomial time algorithms for multicast network code construction

IEEE Transactions on Information Theory
A Random Linear Network Coding Approach to Multicast

IEEE Transactions on Information Theory

Remote data checking for network coding-based distributed storage systems

Proceedings of the 2010 ACM workshop on Cloud computing security workshop
In search of I/O-optimal recovery from disk failures

HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Reducing Repair Traffic in P2P Backup Systems: Exact Regenerating Codes on Hierarchical Codes

ACM Transactions on Storage (TOS)
A Hybrid Approach to Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation

ACM Transactions on Storage (TOS)
Lossy data aggregation with network coding in stand-alone wireless sensor networks

NEW2AN'11/ruSMART'11 Proceedings of the 11th international conference and 4th international conference on Smart spaces and next generation wired/wireless networking
Long-term availability prediction for groups of volunteer resources

Journal of Parallel and Distributed Computing
A novel network coding scheme for data collection in WSNs with a mobile BS

DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
NCCloud: applying network coding for the storage repair in a cloud-of-clouds

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Network coding meets information-centric networking: an architectural case for information dispersion through native network coding

Proceedings of the 1st ACM workshop on Emerging Name-Oriented Mobile Networking Design - Architecture, Algorithms, and Applications
Erasure coding in windows azure storage

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Approximate decoding approaches for network coded correlated data

Signal Processing
Scalia: an adaptive scheme for efficient multi-cloud storage

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Distributed data collection and storage algorithms for collaborative learning vision sensor devices with applications to pilgrimage

International Journal of Sensor Networks
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems

ACM Transactions on Storage (TOS)
Towards self-repairing replication-based storage systems using untrusted clouds

Proceedings of the third ACM conference on Data and application security and privacy
What can coding theory do for storage systems?

ACM SIGACT News
An overview of codes tailor-made for better repairability in networked distributed storage systems

ACM SIGACT News
XORing elephants: novel erasure codes for big data

Proceedings of the VLDB Endowment
Regenerating codes: a system perspective

ACM SIGOPS Operating Systems Review
A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the Facebook warehouse cluster

HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Sporadic decentralized resource maintenance for P2P distributed storage networks

Journal of Parallel and Distributed Computing
An ant colony model based replica consistency maintenance strategy in unstructured P2P networks

Computer Networks: The International Journal of Computer and Telecommunications Networking

Quantified Score

Hi-index	754.84

Visualization

Abstract

Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes. Application scenarios include data centers, peer-to-peer storage systems, and storage in wireless networks. Storing data using an erasure code, in fragments spread across nodes, requires less redundancy than simple replication for the same level of reliability. However, since fragments must be periodically replaced as nodes fail, a key question is how to generate encoded fragments in a distributed way while transferring as little data as possible across the network. For an erasure coded system, a common practice to repair from a single node failure is for a new node to reconstruct the whole encoded data object to generate just one encoded block. We show that this procedure is sub-optimal. We introduce the notion of regenerating codes, which allow a new node to communicate functions of the stored data from the surviving nodes. We show that regenerating codes can significantly reduce the repair bandwidth. Further, we show that there is a fundamental tradeoff between storage and repair bandwidth which we theoretically characterize using flow arguments on an appropriately constructed graph. By invoking constructive results in network coding, we introduce regenerating codes that can achieve any point in this optimal tradeoff.