Heterogeneity and dynamicity of clouds at scale: Google trace analysis

Authors:
Charles Reiss;Alexey Tumanov;Gregory R. Ganger;Randy H. Katz;Michael A. Kozuch
Affiliations:
University of California, Berkeley;Carnegie Mellon University;Carnegie Mellon University;University of California, Berkeley;Intel Labs
Venue:
Proceedings of the Third ACM Symposium on Cloud Computing
Year:
2012

Citing 18
Cited 13

The elusive goal of workload characterization

ACM SIGMETRICS Performance Evaluation Review
The workload on parallel supercomputers: modeling the characteristics of rigid jobs

Journal of Parallel and Distributed Computing
What Supercomputers Say: A Study of Five System Logs

DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Tashi: location-aware cluster management

ACDC '09 Proceedings of the 1st workshop on Automated control for datacenters and clouds
Power-Law Distributions in Empirical Data

SIAM Review
Characterizing, modeling, and generating workload spikes for stateful services

Proceedings of the 1st ACM symposium on Cloud computing
An Analysis of Traces from a Production MapReduce Cluster

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Characterization of workload and resource consumption for an online travel and booking site

IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
Modeling and synthesizing task placement constraints in Google compute clusters

Proceedings of the 2nd ACM Symposium on Cloud Computing
Co-analysis of RAS Log and Job Log on Blue Gene/P

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
The Case for Evaluating MapReduce Performance Using Workload Suites

MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
The datacenter needs an operating system

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
The seven deadly sins of cloud computing research

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Dynamic energy-aware capacity provisioning for cloud computing environments

Proceedings of the 9th international conference on Autonomic computing
alsched: algebraic scheduling of mixed workloads in heterogeneous clouds

Proceedings of the Third ACM Symposium on Cloud Computing
Characterizing Machines and Workloads on a Google Cluster

ICPPW '12 Proceedings of the 2012 41st International Conference on Parallel Processing Workshops

alsched: algebraic scheduling of mixed workloads in heterogeneous clouds

Proceedings of the Third ACM Symposium on Cloud Computing
Omega: flexible, scalable schedulers for large compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
CPI2: CPU performance isolation for shared compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
Effective straggler mitigation: attack of the clones

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Efficient virtual memory for big memory servers

Proceedings of the 40th Annual International Symposium on Computer Architecture
A case for dynamic memory partitioning in data centers

Proceedings of the Second Workshop on Data Analytics in the Cloud
New wine in old skins: the case for distributed operating systems in the data center

Proceedings of the 4th Asia-Pacific Workshop on Systems
Hierarchical scheduling for diverse datacenter workloads

Proceedings of the 4th annual Symposium on Cloud Computing
Introducing service-level awareness in the cloud

Proceedings of the 4th annual Symposium on Cloud Computing
Leveraging collaborative content exchange for on-demand VM multi-deployments in iaas clouds

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
PIKACHU: how to rebalance load in optimizing mapreduce on heterogeneous clusters

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Quasar: resource-efficient and QoS-aware cluster management

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Understanding, modelling, and improving the performance of web applications in multicore virtualised environments

Proceedings of the 5th ACM/SPEC international conference on Performance engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

To better understand the challenges in developing effective cloud-based resource schedulers, we analyze the first publicly available trace data from a sizable multi-purpose cluster. The most notable workload characteristic is heterogeneity: in resource types (e.g., cores:RAM per machine) and their usage (e.g., duration and resources needed). Such heterogeneity reduces the effectiveness of traditional slot- and core-based scheduling. Furthermore, some tasks are constrained as to the kind of machine types they can use, increasing the complexity of resource assignment and complicating task migration. The workload is also highly dynamic, varying over time and most workload features, and is driven by many short jobs that demand quick scheduling decisions. While few simplifying assumptions apply, we find that many longer-running jobs have relatively stable resource utilizations, which can help adaptive resource schedulers.