Cake: enabling high-level SLOs on shared storage systems

Authors:
Andrew Wang;Shivaram Venkataraman;Sara Alspaugh;Randy Katz;Ion Stoica
Affiliations:
University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;University of California, Berkeley
Venue:
Proceedings of the Third ACM Symposium on Cloud Computing
Year:
2012

Citing 30
Cited 5

Congestion avoidance and control

ACM SIGCOMM Computer Communication Review - Special twenty-fifth anniversary issue. Highlights from 25 years of the Computer Communication Review
Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks

IEEE/ACM Transactions on Networking (TON)
Cello: a disk scheduling framework for next generation operating systems

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Lottery and Stride Scheduling: Flexible Proportional-share Resource Management

Lottery and Stride Scheduling: Flexible Proportional-share Resource Management
Façade: Virtual Storage Devices with Performance Guarantees

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Argon: performance insulation for shared storage servers

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Adaptive control of virtualized resources in utility computing environments

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
Towards end-to-end quality of service: controlling I/O interference in shared storage servers

Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
Disk Scheduling with Quality of Service Guarantees

ICMCS '99 Proceedings of the 1999 IEEE International Conference on Multimedia Computing and Systems - Volume 02
Dynamic resource allocation for database servers running on virtual storage

FAST '09 Proccedings of the 7th conference on File and storage technologies
PARDA: proportional allocation of resources for distributed storage access

FAST '09 Proccedings of the 7th conference on File and storage technologies
Providing a cloud network infrastructure on a supercomputer

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
BASIL: automated IO load balancing across storage devices

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
mClock: handling throughput variability for hypervisor IO scheduling

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Differentiated storage services

ACM SIGOPS Operating Systems Review
The SCADS director: scaling a distributed storage system under stringent performance requirements

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Dominant resource fairness: fair allocation of multiple resource types

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Apache hadoop goes realtime at Facebook

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Maestro: quality-of-service in large disk arrays

Proceedings of the 8th ACM international conference on Autonomic computing
Better never than late: meeting deadlines in datacenter networks

Proceedings of the ACM SIGCOMM 2011 conference
Warehouse-Scale Computing: Entering the Teenage Decade

Proceedings of the 38th annual international symposium on Computer architecture
Jockey: guaranteed job latency in data parallel clusters

Proceedings of the 7th ACM european conference on Computer Systems
scc: cluster storage provisioning informed by application characteristics and SLAs

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
PACMan: coordinated memory caching for parallel jobs

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
TCP Vegas: end to end congestion avoidance on a global Internet

IEEE Journal on Selected Areas in Communications
Multi-resource fair queueing for packet processing

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
DeTail: reducing the flow completion time tail in datacenter networks

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Sweet storage SLOs with Frosting

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing

Fairness and isolation in multi-tenant storage as optimization decomposition

ACM SIGOPS Operating Systems Review
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
IOFlow: a software-defined storage architecture

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters

Proceedings of the 4th annual Symposium on Cloud Computing
Limplock: understanding the impact of limpware on scale-out cloud systems

Proceedings of the 4th annual Symposium on Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cake is a coordinated, multi-resource scheduler for shared distributed storage environments with the goal of achieving both high throughput and bounded latency. Cake uses a two-level scheduling scheme to enforce high-level service-level objectives (SLOs). First-level schedulers control consumption of resources such as disk and CPU. These schedulers (1) provide mechanisms for differentiated scheduling, (2) split large requests into smaller chunks, and (3) limit the number of outstanding device requests, which together allow for effective control over multi-resource consumption within the storage system. Cake's second-level scheduler coordinates the first-level schedulers to map high-level SLO requirements into actual scheduling parameters. These parameters are dynamically adjusted over time to enforce high-level performance specifications for changing workloads. We evaluate Cake using multiple workloads derived from real-world traces. Our results show that Cake allows application programmers to explore the latency vs. throughput trade-off by setting different high-level performance requirements on their workloads. Furthermore, we show that using Cake has concrete economic and business advantages, reducing provisioning costs by up to 50% for a consolidated workload and reducing the completion time of an analytics cycle by up to 40%.