Adaptive and dynamic funnel replication in clouds

Authors:
Guy Laden;Roie Melamed;Ymir Vigfusson
Affiliations:
IBM Research Haifa, Israel;IBM Research Haifa, Israel;School of CS, Reykjavik University
Venue:
ACM SIGOPS Operating Systems Review
Year:
2012

Citing 19
Cited 0

Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Towards robust distributed systems (abstract)

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Are quorums an alternative for data replication?

ACM Transactions on Database Systems (TODS)
SplitStream: high-bandwidth multicast in cooperative environments

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Chain replication for supporting high throughput and availability

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Data management for internet-scale single-sign-on

WORLDS'06 Proceedings of the 3rd conference on USENIX Workshop on Real, Large Distributed Systems - Volume 3
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
The Chubby lock service for loosely-coupled distributed systems

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Paxos made live: an engineering perspective

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Tight bounds for clock synchronization

Journal of the ACM (JACM)
Consistency rationing in the cloud: pay only when it matters

Proceedings of the VLDB Endowment
Empirical evaluation of latency-sensitive application performance in the cloud

MMSys '10 Proceedings of the first annual ACM SIGMM conference on Multimedia systems
Mencius: building efficient replicated state machines for WANs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Object storage on CRAQ: high-throughput chain replication for read-mostly workloads

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
ZooKeeper: wait-free coordination for internet-scale systems

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Resource provisioning of web applications in heterogeneous clouds

WebApps'11 Proceedings of the 2nd USENIX conference on Web application development
Replication techniques for availability

Replication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of strongly consistent replication in a multi data center cloud setting. This environment is characterized by high latency communication between data centers, significant fluctuations in the performance of seemingly identical virtual machines (VMs) and temporary disconnects of data centers from the rest of the cloud. In this paper we introduce the adaptive and dynamic Funnel Replication (FR) protocol that is designed to achieve high throughout and low latency for reads, to accommodate arbitrary latency/throughput tradeoffs for writes, to maximize performance in the face of VM performance variations and to provide high availability for read requests in the presence of network partitions. FR is based on the idea of flexible write dissemination topologies which enables it to achieve, per message, the desired tradeoff between latency and throughput, depending on the message size, the observed network conditions, and the importance of latency as indicated by the client. We demonstrate the benefits of flexible dissemination topologies and show that in a cloud setting with N identical replicas FR can improve the write latency up to a factor of N/2 for N ≥ 2 compared to the notable chain replication (CR) protocol at the expense of a slight decrease in the write throughput. In a setting with potentially high variability in the performance of replicas, e.g., as in Amazon EC2, FR can achieve throughput up to a factor of 16 higher than CR while also improving the latency. FR does this by adopting a topology that consists of concurrent disjoint data replication paths so that load on high throughput paths is adaptively increased while load on congested replicas is reduced.