Mitigating High Latency Outliers for Cloud-Based Telecommunication Services

Authors:
Fangzhe Chang;Peter S. Fales;Moritz Steiner;Ramesh Viswanathan;Thomas J. Williams;Thomas L. Wood
Affiliations:
Bell Labs, Murray Hill, New Jersey;Bell Labs Service Infrastructure research department, Naperville, Illinois;Bell Labs, Murray Hill, New Jersey;Bell Labs' Enabling Computing Technologies research domain, Murray Hill, New Jersey;Bell Labs' Service Infrastructure Research Domain, Columbus, Ohio;Bell Labs' Enabling Computing Technologies research domain, Holmdel, New Jersey
Venue:
Bell Labs Technical Journal
Year:
2012

Citing 11
Cited 0

An efficient, fault-tolerant protocol for replicated data management

PODS '85 Proceedings of the fourth ACM SIGACT-SIGMOD symposium on Principles of database systems
Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services

ACM SIGACT News
Weighted voting for replicated data

SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
Distributed caching with memcached

Linux Journal
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Cassandra: a decentralized structured storage system

ACM SIGOPS Operating Systems Review
CouchDB: The Definitive Guide Time to Relax

CouchDB: The Definitive Guide Time to Relax
The impact of virtualization on network performance of amazon EC2 data center

INFOCOM'10 Proceedings of the 29th conference on Information communications
Explaining packet delays under virtualization

ACM SIGCOMM Computer Communication Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Telecommunication applications are distinguished by their stringent requirements for availability and completion times. A highly available, low-latency, distributed data store is therefore a critical component of cloud-based realizations of telecommunication services. We present a systematic experimental evaluation of state-of-the-art database systems as components of telecommunication applications. We show that while their average latencies are well within the required time scales, the distribution of latencies exhibits a long tail of unacceptably large outliers which may significantly impair meeting the performance requirements of telecommunication applications. To address the observed phenomenon of high latency outliers, we present a new solution that is implemented in a Bell Labs system code named Flurry. Flurry is based on using the first response from a replica rather than waiting for all or a quorum of responses from replicas. To handle incorrect responses arising from message losses, Flurry uses a novel checking algorithm based on vector clocks to determine the correctness of a replica's response. We present experimental evaluation results which show that Flurry significantly reduces both the average response time and the probability of unacceptable response times to values that would allow meeting the availability and completion time thresholds required for telecommunication services. © 2012 Alcatel-Lucent. © 2012 Wiley Periodicals, Inc.