Zoolander: efficient latency management in NoSQL stores

Authors:
Aniket Chakrabarti;Christopher Stewart;Daiyi Yang;Rean Griffith
Affiliations:
The Ohio State University;The Ohio State University;StumbleUpon.com;VMWARE
Venue:
Proceedings of the Posters and Demo Track
Year:
2012

Citing 4
Cited 0

ZooKeeper: wait-free coordination for internet-scale systems

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
EntomoModel: Understanding and Avoiding Performance Anomaly Manifestations

MASCOTS '10 Proceedings of the 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
The SCADS director: scaling a distributed storage system under stringent performance requirements

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Scientific Computing in the Cloud

Computing in Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

NoSQL stores expose narrow APIs for data access, e.g., get(key) or put(key, val). While these APIs often give up strong consistency and transactions, they can scale throughput under intense workloads. Widely used stores, e.g., Apache Zookeeper, Cassandra, and Memcached, have been shown to achieve 1010 accesses per day in the face of workload shifts, software faults, and performance bugs. However providing low latency for every access remains challenging. Latency, unlike throughput, quickly yields diminishing returns under scale out approaches, making it important to choose the most efficient approach. Further, DNS timeouts, GC, and other rare system events can hold resources from time to time [3]. These events hardly impact throughput, but they can increase latency for some accesses a lot. Internet services that access a lot of data under tight response time demands need NoSQL stores that provide low latency all the time. Figure 1 depicts such services and their demands. We describe them below. 1. Old-school services, such as e-commerce websites, are increasingly using NoSQL instead of databases. In these embarrassingly parallel services, end-user requests access data independently but each request must complete quickly. Slow accesses translate to unhappy end users. 2. Map reduce services spawn parallel worker nodes that compute local results and forward them to reducers to produce the final output. Here, the term service reflects a growing trend where jobs are expected to complete within response time targets. We note two reasons for this trend. First, jobs that run on pay-as-you-go clouds cost less if they finish within 1-hour leasing intervals. Second, jobs that finish quickly offer qualitative business advantages, e.g., real-time Twitter analysis. 3. Scientific computing as a service is an emerging workload on public clouds [2]. In the past, these workloads ran only on private, custom hardware but public clouds can offer performance-to-cost efficiency. The challenge is matching the absolute performance of private hardware. These workloads use barriers and synchronization heavily. One slow data access can delay the completion of a barrier and ultimately delay the entire workload.