Addressing shared resource contention in multicore processors via scheduling

Authors:
Sergey Zhuravlev;Sergey Blagodurov;Alexandra Fedorova
Affiliations:
Simon Fraser University, Vancouver, BC, Canada;Simon Fraser University, Vancouver, BC, Canada;Simon Fraser University, Vancouver, BC, Canada
Venue:
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Year:
2010

Citing 20
Cited 87

Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
OS-Controlled Cache Predictability for Real-Time Systems

RTAS '97 Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97)
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Memory Bandwidth Aware Scheduling for SMP Cluster Nodes

PDP '05 Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
StatCache: a probabilistic approach to efficient and accurate data locality analysis

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Effective Management of DRAM Bandwidth in Multicore Processors

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Microarchitecture-Independent Workload Characterization

IEEE Micro
Using OS Observations to Improve Performance in Multicore Systems

IEEE Micro
Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Towards practical page coloring-based multicore cache management

Proceedings of the 4th ACM European conference on Computer systems
HASS: a scheduler for heterogeneous multicore systems

ACM SIGOPS Operating Systems Review
Rate-based QoS techniques for cache/memory in CMP platforms

Proceedings of the 23rd international conference on Supercomputing
vGreen: a system for energy efficient computing in virtualized environments

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design

Managing contention for shared resources on multicore processors

Communications of the ACM
Managing Contention for Shared Resources on Multicore Processors

Queue - Power Management
Q-clouds: managing performance interference effects for QoS-aware clouds

Proceedings of the 5th European conference on Computer systems
AKULA: a toolset for experimenting and developing thread placement algorithms on multicore systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A case for NUMA-aware contention management on multicore systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Leveraging workload diversity through OS scheduling to maximize performance on single-ISA heterogeneous multicore systems

Journal of Parallel and Distributed Computing
Online cache modeling for commodity multicore processors

ACM SIGOPS Operating Systems Review
Building extensible networks with rule-based forwarding

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Memory Latency Reduction via Thread Throttling

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
All-window profiling and composable models of cache sharing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
ULCC: a user-level facility for optimizing shared cache performance on multicores

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Directly characterizing cross core interference through contention synthesis

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
In search for contention-descriptive metrics in HPC cluster environment

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
SRM-buffer: an OS buffer management technique to prevent last level cache from thrashing in multicores

Proceedings of the sixth conference on Computer systems
Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead

Proceedings of the international symposium on Memory management
Dynamic adaptive scheduling for virtual machines

Proceedings of the 20th international symposium on High performance distributed computing
Enhancement of Xen's scheduler for MapReduce workloads

Proceedings of the 20th international symposium on High performance distributed computing
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip

Proceedings of the 38th annual international symposium on Computer architecture
Contentiousness vs. sensitivity: improving contention aware runtime systems on multicore architectures

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Pervasive parallelism for managed runtimes

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
A case for NUMA-aware contention management on multicore systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
FACT: a framework for adaptive contention-aware thread migrations

Proceedings of the 8th ACM International Conference on Computing Frontiers
Scheduling javasymphony applications on many-core parallel computers

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines

Proceedings of the 2nd ACM Symposium on Cloud Computing
Agent based load balancing scheme using affinity processor scheduling for multicore architectures

WSEAS Transactions on Computers
A cluster computer performance predictor for memory scheduling

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Thread Tranquilizer: Dynamically reducing performance variation

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
CRUISE: cache replacement and utility-aware scheduling

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
DejaVu: accelerating resource allocation in virtualized environments

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Region scheduling: efficiently using the cache architectures via page-level affinity

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
REEact: a customizable virtual execution manager for multicore platforms

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing memory interference in multicore systems via application-aware memory channel partitioning

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems

ACM Transactions on Computer Systems (TOCS)
Machine learning based performance prediction for multi-core simulation

MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence
Share memory aware scheduler: balancing performance and fairness

Proceedings of the great lakes symposium on VLSI
Reuse distance based performance modeling and workload mapping

Proceedings of the 9th conference on Computing Frontiers
Toward predictable performance in software packet-processing platforms

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Courteous cache sharing: being nice to others in capacity management

Proceedings of the 49th Annual Design Automation Conference
Providing fairness on shared-memory multiprocessors via process scheduling

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Compiling for niceness: mitigating contention for QoS in warehouse scale computers

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Achieving application-centric performance targets via consolidation on multicores: myth or reality?

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Cache Conscious Task Regrouping on Multicore Processors

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Nonuniform memory affinity strategy in multithreaded sparse matrix computations

Proceedings of the 2012 Symposium on High Performance Computing
Dynamic virtual machine scheduling in clouds for architectural shared resources

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Reducing last level cache pollution in NUMA multicore systems for improving cache performance

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
Scalability-based manycore partitioning

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Efficient techniques for predicting cache sharing and throughput

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
A software memory partition approach for eliminating bank-level interference in multicore systems

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Survey of scheduling techniques for addressing shared resources in multicore processors

ACM Computing Surveys (CSUR)
When less is more (LIMO):controlled parallelism forimproved efficiency

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Critical path-based thread placement for NUMA systems

ACM SIGMETRICS Performance Evaluation Review
Resource-freeing attacks: improve your cloud performance (at your neighbor's expense)

Proceedings of the 2012 ACM conference on Computer and communications security
Exploiting inter-sequence correlations for program behavior prediction

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ADAPT: A framework for coscheduling multithreaded programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
The Impact of Hypervisor Layer on Database Applications

UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Providing performance guarantees in multipass network processors

IEEE/ACM Transactions on Networking (TON)
Moths: Mobile threads for on-chip networks

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
HOTL: a higher order theory of locality

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Traffic management: a holistic approach to memory placement on NUMA systems

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Cache-Conscious Wavefront Scheduling

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Interference and locality-aware task scheduling for MapReduce applications in virtual clusters

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Redesigning MPI shared memory communication for large multi-core architecture

Computer Science - Research and Development
Analyzing resource interdependencies in multi-core architectures to improve scheduling decisions

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Scheduling optimization in multicore multithreaded microprocessors through dynamic modeling

Proceedings of the ACM International Conference on Computing Frontiers
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers

Proceedings of the 40th Annual International Symposium on Computer Architecture
Interference resilient PDES on multi-core systems: towards proportional slowdown

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Reducing contention through priority updates

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Characterization and modeling of PIDX parallel I/O for performance optimization

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Coordinated power-performance optimization in manycores

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
L1-bandwidth aware thread allocation in multicore SMT processors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
An empirical model for predicting cross-core performance interference on multicore processors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
DeepDive: transparently identifying and managing performance interference in virtualized environments

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
ReSense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity

ACM Transactions on Architecture and Code Optimization (TACO)
On modeling contention for shared caches in multi-core processors with techniques from ecology

Natural Computing: an international journal
Towards software performance engineering for multicore and manycore systems

ACM SIGMETRICS Performance Evaluation Review
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting multi-core nodes in peer-to-peer grids

Journal of Parallel and Distributed Computing
Virtual machine consolidation based on interference modeling

The Journal of Supercomputing
What to expect when you are consolidating: effective prediction models of application performance on multicores

Cluster Computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how and to what extent contention for shared resource can be mitigated via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate into the system. Our study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling. The most difficult part of the problem is to find a classification scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a comprehensive analysis of such classification schemes using a newly proposed methodology that enables to evaluate these schemes separately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware. To show the applicability of our analysis we design a new scheduling algorithm, which we prototype at user level, and demonstrate that it performs within 2\% of the optimal. We also conclude that the highest impact of contention-aware scheduling techniques is not in improving performance of a workload as a whole but in improving quality of service or performance isolation for individual applications.