The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Resource containers: a new facility for resource management in server systems
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Learning from Incomplete Data
Distributed caching with memcached
Linux Journal
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Exploiting Platform Heterogeneity for Power Efficient Data Centers
ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Workload Analysis and Demand Prediction of Enterprise Data Center Applications
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Q-clouds: managing performance interference effects for QoS-aware clouds
Proceedings of the 5th European conference on Computer systems
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Classification with Incomplete Data Using Dirichlet Process Priors
The Journal of Machine Learning Research
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Dominant resource fairness: fair allocation of multiple resource types
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Practical Machine Learning Tools and Techniques
Vantage: scalable and efficient fine-grain cache partitioning
Proceedings of the 38th annual international symposium on Computer architecture
Power management of online data-intensive services
Proceedings of the 38th annual international symposium on Computer architecture
A Cost-Aware Elasticity Provisioning System for the Cloud
ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
Warehouse-Scale Computing: Entering the Teenage Decade
Proceedings of the 38th annual international symposium on Computer architecture
CloudScale: elastic resource scaling for multi-tenant cloud systems
Proceedings of the 2nd ACM Symposium on Cloud Computing
Mining of Massive Datasets
Tarazu: optimizing MapReduce on heterogeneous clusters
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
DejaVu: accelerating resource allocation in virtualized environments
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneity and dynamicity of clouds at scale: Google trace analysis
Proceedings of the Third ACM Symposium on Cloud Computing
Communications of the ACM
Paragon: QoS-aware scheduling for heterogeneous datacenters
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Omega: flexible, scalable schedulers for large compute clusters
Proceedings of the 8th ACM European Conference on Computer Systems
CPI2: CPU performance isolation for shared compute clusters
Proceedings of the 8th ACM European Conference on Computer Systems
Effective straggler mitigation: attack of the clones
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers
Proceedings of the 40th Annual International Symposium on Computer Architecture
Whare-map: heterogeneity in "homogeneous" warehouse-scale computers
Proceedings of the 40th Annual International Symposium on Computer Architecture
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hi-index | 0.00 |
Cloud computing promises flexibility and high performance for users and high cost-efficiency for operators. Nevertheless, most cloud facilities operate at very low utilization, hurting both cost effectiveness and future scalability. We present Quasar, a cluster management system that increases resource utilization while providing consistently high application performance. Quasar employs three techniques. First, it does not rely on resource reservations, which lead to underutilization as users do not necessarily understand workload dynamics and physical resource requirements of complex codebases. Instead, users express performance constraints for each workload, letting Quasar determine the right amount of resources to meet these constraints at any point. Second, Quasar uses classification techniques to quickly and accurately determine the impact of the amount of resources (scale-out and scale-up), type of resources, and interference on performance for each workload and dataset. Third, it uses the classification results to jointly perform resource allocation and assignment, quickly exploring the large space of options for an efficient way to pack workloads on available resources. Quasar monitors workload performance and adjusts resource allocation and assignment when needed. We evaluate Quasar over a wide range of workload scenarios, including combinations of distributed analytics frameworks and low-latency, stateful services, both on a local cluster and a cluster of dedicated EC2 servers. At steady state, Quasar improves resource utilization by 47% in the 200-server EC2 cluster, while meeting performance constraints for workloads of all types.