D-factor: a quantitative model of application slow-down in multi-resource shared systems

Authors:
Seung-Hwan Lim;Jae-Seok Huh;Youngjae Kim;Galen M. Shipman;Chita R. Das
Affiliations:
The Pennsylvania State University, University Park, PA, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA;The Pennsylvania State University, University Park, PA, USA
Venue:
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Year:
2012

Citing 27
Cited 3

Scheduling parallel machines on-line

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Improved approximation algorithms for shop scheduling problems

SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
A lower bound for randomized on-line scheduling algorithms

Information Processing Letters
An analytic behavior model for disk drives with readahead caches and request reordering

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The working set model for program behavior

Communications of the ACM
Online Scheduling Revisited

ESA '00 Proceedings of the 8th Annual European Symposium on Algorithms
Improved Bounds for the Online Scheduling Problem

SIAM Journal on Computing
Two-Level Iterative Queuing Modeling of Software Contention

MASCOTS '02 Proceedings of the 10th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Performance prediction based on inherent program similarity

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Power provisioning for a warehouse-sized computer

Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Server-storage virtualization: integration and load balancing in data centers

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Entropy: a consolidation manager for clusters

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Online Scheduling with Bounded Migration

Mathematics of Operations Research
Managing contention for shared resources on multicore processors

Communications of the ACM
A view of cloud computing

Communications of the ACM
Q-clouds: managing performance interference effects for QoS-aware clouds

Proceedings of the 5th European conference on Computer systems
Efficient resource provisioning in compute clouds via VM multiplexing

Proceedings of the 7th international conference on Autonomic computing
Improving the scalability of data center networks with traffic-aware virtual machine placement

INFOCOM'10 Proceedings of the 29th conference on Information communications
Server workload analysis for power minimization using consolidation

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
METE: meeting end-to-end QoS in multicores through system-wide resource management

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Modeling and synthesizing task placement constraints in Google compute clusters

Proceedings of the 2nd ACM Symposium on Cloud Computing
Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines

Proceedings of the 2nd ACM Symposium on Cloud Computing
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Exertion-based billing for cloud storage access

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing

Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers

Proceedings of the 40th Annual International Symposium on Computer Architecture
Enabling fair pricing on HPC systems with node sharing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Agile middleware for scheduling: meeting competing performance requirements of diverse tasks

Proceedings of the 5th ACM/SPEC international conference on Performance engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scheduling multiple jobs onto a platform enhances system utilization by sharing resources. The benefits from higher resource utilization include reduced cost to construct, operate, and maintain a system, which often include energy consumption. Maximizing these benefits, while satisfying performance limits, comes at a price -- resource contention among jobs increases job completion time. In this paper, we analyze slow-downs of jobs due to contention for multiple resources in a system; referred to as dilation factor. We observe that multiple-resource contention creates non-linear dilation factors of jobs. From this observation, we establish a general quantitative model for dilation factors of jobs in multi-resource systems. A job is characterized by a vector-valued loading statistics and dilation factors of a job set are given by a quadratic function of their loading vectors. We demonstrate how to systematically characterize a job, maintain the data structure to calculate the dilation factor (loading matrix), and calculate the dilation factor of each job. We validated the accuracy of the model with multiple processes running on a native Linux server, virtualized servers, and with multiple MapReduce workloads co-scheduled in a cluster. Evaluation with measured data shows that the D-factor model has an error margin of less than 16%. We also show that the model can be integrated with an existing on-line scheduler to minimize the makespan of workloads.