Replication degree customization for high availability

Authors:
Ming Zhong;Kai Shen;Joel Seiferas
Affiliations:
Google, Inc., USA;University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA
Venue:
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Year:
2008

Citing 31
Cited 6

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Disconnected operation in the Coda File System

ACM Transactions on Computer Systems (TOCS)
A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems

Software—Practice & Experience
Practical Byzantine fault tolerance

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Manageability, availability and performance in Porcupine: a highly scalable, cluster-based mail service

Proceedings of the seventeenth ACM symposium on Operating systems principles
Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Minimal replication cost for availability

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Replication strategies in unstructured peer-to-peer networks

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Competitive Hill-Climbing Strategies for Replica Placement in a Distributed File System

DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Erasure Coding Vs. Replication: A Quantitative Comparison

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
End-to-end WAN service availability

IEEE/ACM Transactions on Networking (TON)
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Integrated resource management for cluster-based internet services

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Passive NFS Tracing of Email and Research Workloads

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Beehive: O(1)lookup performance for power-law query distributions in peer-to-peer overlays

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Beehive: O(1)lookup performance for power-law query distributions in peer-to-peer overlays

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Total recall: system support for automated availability management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Glacier: highly durable, decentralized storage despite massive correlated failures

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Scalable, distributed data structures for internet service construction

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Chain replication for supporting high throughput and availability

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Neptune: scalable replication management and programming support for cluster-based network services

USITS'01 Proceedings of the 3rd conference on USENIX Symposium on Internet Technologies and Systems - Volume 3
Exploiting availability prediction in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Availability of multi-object operations

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Subtleties in tolerating correlated failures in wide-area storage systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
High availability in DHTs: erasure coding vs. replication

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
Clustering support and replication management for scalable network services

IEEE Transactions on Parallel and Distributed Systems

Optimizing data popularity conscious bloom filters

Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
MOON: MapReduce On Opportunistic eNvironments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Resource adaptive distributed information sharing

EUNICE'10 Proceedings of the 16th EUNICE/IFIP WG 6.6 conference on Networked services and applications: engineering, control and management
Differentiated replication strategy in data centers

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Reliable MapReduce computing on opportunistic resources

Cluster Computing
Durable data storage in distributed non persistent caching environment

Proceedings of the 6th ACM India Computing Convention

Quantified Score

Hi-index	0.00

Visualization

Abstract

Object replication is a common approach to enhance the availability of distributed data-intensive services and storage systems. Many such systems are known to have highly skewed object request probability distributions. In this paper, we propose an object replication degree customization scheme that maximizes the expected service availability under given object request probabilities, object sizes, and space constraints (e.g., memory/storage capacities). In particular, we discover that the optimal replication degree of an object should be linear in the logarithm of its popularity-to-size ratio. We also study the feasibility and effectiveness of our proposed scheme using applications driven by real-life system object request traces and machine failure traces. When the data object popularity distribution is known a priori, our proposed customization can achieve 1.32-2.92 "nines" increase in system availability (or 21-74% space savings at the same availability level) compared to uniform replication. Results also suggest that our scheme requires a moderate amount of replica creation/removal overhead (weekly changes involve no more than 0.24% objects and no more than 0.11% of total data size) under realistic object request popularity changes.