A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads

Authors:
Tyler Dwyer;Alexandra Fedorova;Sergey Blagodurov;Mark Roth;Fabien Gaud;Jian Pei
Affiliations:
Simon Fraser University;Simon Fraser University;Simon Fraser University;Simon Fraser University;Simon Fraser University;Simon Fraser University
Venue:
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2012

Citing 14
Cited 4

Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Using OS Observations to Improve Performance in Multicore Systems

IEEE Micro
Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
ITCA: Inter-task Conflict-Aware CPU Accounting for CMPs

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Resource-conscious scheduling for energy efficiency on multicore processors

Proceedings of the 5th European conference on Computer systems
Q-clouds: managing performance interference effects for QoS-aware clouds

Proceedings of the 5th European conference on Computer systems
Contention-Aware Scheduling on Multicore Systems

ACM Transactions on Computer Systems (TOCS)
Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs

Proceedings of the 38th annual international symposium on Computer architecture
A case for NUMA-aware contention management on multicore systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Mechanistic-empirical processor performance modeling for constructing CPI stacks on real hardware

ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines

Proceedings of the 2nd ACM Symposium on Cloud Computing

Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers

Proceedings of the 40th Annual International Symposium on Computer Architecture
Characterization and modeling of PIDX parallel I/O for performance optimization

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Enabling fair pricing on HPC systems with node sharing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Virtual Machine Coscheduling: A Game Theoretic Approach

UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

When multiple threads or processes run on a multi-core CPU they compete for shared resources, such as caches and memory controllers, and can suffer performance degradation as high as 200%. We design and evaluate a new machine learning model that estimates this degradation online, on previously unseen workloads, and without perturbing the execution. Our motivation is to help data center and HPC cluster operators effectively use workload consolidation. Data center consolidation is about placing many applications on the same server to maximize hardware utilization. In HPC clusters, processes of the same distributed applications run on the same machine. Consolidation improves hardware utilization, but may sacrifice performance as processes compete for resources. Our model helps determine when consolidation is overly harmful to performance. Our work is the first to apply machine learning to this problem domain, and we report on our experience reaping the advantages of machine learning while navigating around its limitations. We demonstrate how the model can be used to improve performance fidelity and save energy for HPC workloads.