Glacier: highly durable, decentralized storage despite massive correlated failures

Authors:
Andreas Haeberlen;Alan Mislove;Peter Druschel
Affiliations:
Department of Computer Science, Rice University;Department of Computer Science, Rice University;Department of Computer Science, Rice University
Venue:
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Year:
2005

Citing 34
Cited 51

Efficient dispersal of information for security, load balancing, and fault tolerance

Journal of the ACM (JACM)
The design and implementation of a log-structured file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Analysis and Modeling of Correlated Failures in Multicomputer Systems

IEEE Transactions on Computers - Special issue on fault-tolerant computing
A prototype implementation of archival Intermemory

Proceedings of the fourth ACM conference on Digital libraries
Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
OceanStore: an architecture for global-scale persistent storage

ACM SIGPLAN Notices
The free haven project: distributed anonymous storage service

International workshop on Designing privacy enhancing technologies: design issues in anonymity and unobservability
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Code red worm propagation modeling and analysis

Proceedings of the 9th ACM conference on Computer and communications security
Survivable Information Storage Systems

Computer
Code-Red: a case study on the spread and victims of an internet worm

Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
Myriad: Cost-Effective Disaster Tolerance

FAST '02 Proceedings of the Conference on File and Storage Technologies
Scalable Secure Storage when Half the System Is Faulty

ICALP '00 Proceedings of the 27th International Colloquium on Automata, Languages and Programming
The Sybil Attack

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Erasure Coding Vs. Replication: A Quantitative Comparison

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
How to Own the Internet in Your Spare Time

Proceedings of the 11th USENIX Security Symposium
Evaluating quorum systems over the Internet

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Introspective Failure Analysis: Avoiding Correlated Failures in Peer-to-Peer Systems

SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
PAST: A Large-Scale, Persistent Peer-to-Peer Storage Utility

HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service

HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Inside the Slammer Worm

IEEE Security and Privacy
Farsite: federated, available, and reliable storage for an incompletely trusted environment

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Ivy: a read/write peer-to-peer file system

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Pastiche: making backup cheap and easy

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Secure routing for structured peer-to-peer overlay networks

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Designing for Disasters

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Surviving internet catastrophes

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
High availability, scalable storage, dynamic peer networks: pick two

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
POST: a secure, resilient, cooperative messaging system

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
The phoenix recovery system: rebuilding from the ashes of an internet catastrophe

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Total recall: system support for automated availability management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Scribe: a large-scale and decentralized application-level multicast infrastructure

IEEE Journal on Selected Areas in Communications

Exploiting redundancy to conserve energy in storage systems

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Long-term threats to secure archives

Proceedings of the second ACM workshop on Storage security and survivability
Experiences in building and operating ePOST, a reliable peer-to-peer application

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Surviving internet catastrophes

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Antiquity: exploiting a secure log for wide-area distributed storage

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Optimal inter-object correlation when replicating for availability

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
PeerReview: practical accountability for distributed systems

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Stochastic analysis of the interplay between object maintenance and churn

Computer Communications
Replication degree customization for high availability

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
An optimal multimedia object allocation solution in transcoding-enabled wide-area storage systems

Proceedings of the 2nd international conference on Ubiquitous information management and communication
SafeStore: a durable and practical storage system

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
POTSHARDS: secure long-term storage without encryption

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Proactive replication in distributed storage systems using machine availability estimation

CoNEXT '07 Proceedings of the 2007 ACM CoNEXT conference
Pergamum: replacing tape with energy efficient, reliable, disk-based archival storage

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
RADOS: a scalable, reliable storage service for petabyte-scale storage clusters

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
UsenetDHT: a low-overhead design for Usenet

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
DieCast: testing distributed systems with an accurate scale model

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Friendstore: cooperative online backup using trusted nodes

Proceedings of the 1st Workshop on Social Network Systems
Improving peer-to-peer performance through server-side scheduling

ACM Transactions on Computer Systems (TOCS)
A three-tier information management architecture for mobile grid environments

Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia
HYDRAstor: a Scalable Secondary Storage

FAST '09 Proccedings of the 7th conference on File and storage technologies
Tiered fault tolerance for long-term integrity

FAST '09 Proccedings of the 7th conference on File and storage technologies
POTSHARDS—a secure, recoverable, long-term archival storage system

ACM Transactions on Storage (TOS)
Redundancy Maintenance and Garbage Collection Strategies in Peer-to-Peer Storage Systems

SSS '09 Proceedings of the 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems
Optimizing peer-to-peer backup using lifetime estimations

Proceedings of the 2009 EDBT/ICDT Workshops
A novel secure distributed disk system

ICACT'09 Proceedings of the 11th international conference on Advanced Communication Technology - Volume 2
A platform for cooperative server backups based on virtual machines

ISAS'08 Proceedings of the 5th international conference on Service availability
TrustCode: P2P reputation-based trust management using network coding

HiPC'08 Proceedings of the 15th international conference on High performance computing
Lithium: virtual machine storage for the cloud

Proceedings of the 1st ACM symposium on Cloud computing
MOON: MapReduce On Opportunistic eNvironments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Nebulas: using distributed voluntary resources to build clouds

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Depot: cloud storage with minimal trust

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Overlay routing under geographically correlated failures in distributed event-based systems

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
Small trusted primitives for dependable systems

ACM SIGOPS Operating Systems Review
AONT-RS: blending security and performance in dispersed storage systems

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
DieCast: Testing Distributed Systems with an Accurate Scale Model

ACM Transactions on Computer Systems (TOCS)
Middleware for a re-configurable distributed archival store based on secret sharing

Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Reducing Repair Traffic in P2P Backup Systems: Exact Regenerating Codes on Hierarchical Codes

ACM Transactions on Storage (TOS)
Evaluation of p2p systems under different churn models: why we should bother

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
An infrastructure for long-term archiving of authenticated and sensitive electronic documents

EuroPKI'10 Proceedings of the 7th European conference on Public key infrastructures, services and applications
Repairing Flocks in Peer-to-Peer Networks

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Depot: Cloud Storage with Minimal Trust

ACM Transactions on Computer Systems (TOCS)
SeWDReSS: on the design of an application independent, secure, wide-area disaster recovery storage system

Multimedia Tools and Applications
Reliable MapReduce computing on opportunistic resources

Cluster Computing
Efficient cooperative backup with decentralized trust management

ACM Transactions on Storage (TOS)
Erasure coding in windows azure storage

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Robust Redundancy Scheme for the Repair Process: Hierarchical Codes in the Bandwidth-Limited Systems

Journal of Grid Computing
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems

ACM Transactions on Storage (TOS)
Reducing Correlated Failures Impact in Peer-to-Peer Storage Systems Using Mobile Agents Flocks

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Copysets: reducing the frequency of data loss in cloud storage

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Concurrent deletion in a distributed content-addressable storage system with global deduplication

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decentralized storage systems aggregate the available disk space of participating computers to provide a large storage facility. These systems rely on data redundancy to ensure durable storage despite of node failures. However, existing systems either assume independent node failures, or they rely on introspection to carefully place redundant data on nodes with low expected failure correlation. Unfortunately, node failures are not independent in practice and constructing an accurate failure model is difficult in large-scale systems. At the same time, malicious worms that propagate through the Internet pose a real threat of large-scale correlated failures. Such rare but potentially catastrophic failures must be considered when attempting to provide highly durable storage. In this paper, we describe Glacier, a distributed storage system that relies on massive redundancy to mask the effect of large-scale correlated failures. Glacier is designed to aggressively minimize the cost of this redundancy in space and time: Erasure coding and garbage collection reduces the storage cost; aggregation of small objects and a loosely coupled maintenance protocol for redundant fragments minimizes the messaging cost. In one configuration, for instance, our system can provide six-nines durable storage despite correlated failures of up to 60% of the storage nodes, at the cost of an elevenfold storage overhead and an average messaging overhead of only 4 messages per node and minute during normal operation. Glacier is used as the storage layer for an experimental serverless email system.