Survey of scheduling techniques for addressing shared resources in multicore processors

Authors:
Sergey Zhuravlev;Juan Carlos Saez;Sergey Blagodurov;Alexandra Fedorova;Manuel Prieto
Affiliations:
Simon Fraser University, Canada;Complutense University of Madrid, Madrid, Spain;Simon Fraser University, Canada;Simon Fraser University, Canada;Complutense University of Madrid, Madrid, Spain
Venue:
ACM Computing Surveys (CSUR)
Year:
2012

Citing 96
Cited 2

Optimal Partitioning of Cache Memory

IEEE Transactions on Computers
Impact of sharing-based thread placement on multithreaded architectures

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Cilk: an efficient multithreaded runtime system

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Parallel programming in OpenMP

Parallel programming in OpenMP
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The working set model for program behavior

Communications of the ACM
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Scheduler-based DRAM energy management

Proceedings of the 39th annual Design Automation Conference
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Compile-Time Based Performance Prediction

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
OS-Controlled Cache Predictability for Real-Time Systems

RTAS '97 Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97)
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Dynamic tracking of page miss ratio curve for memory management

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Adaptive History-Based Memory Schedulers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Memory Bandwidth Aware Scheduling for SMP Cluster Nodes

PDP '05 Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing
Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
An analytical model for cache replacement policy performance

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
StatCache: a probabilistic approach to efficient and accurate data locality analysis

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Thread-associative memory for multicore and multithreaded computing

Proceedings of the 2006 international symposium on Low power electronics and design
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
CMP cache performance projection: accessibility vs. capacity

ACM SIGARCH Computer Architecture News
From chaos to QoS: case studies in CMP resource management

ACM SIGARCH Computer Architecture News
Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS

ACM SIGARCH Computer Architecture News
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A one-shot configurable-cache tuner for improved energy and performance

Proceedings of the conference on Design, automation and test in Europe
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
Eliminating inter-process cache interference through cache reconfigurability for real-time and low-power embedded multi-tasking systems

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Path: page access tracking to improve memory management

Proceedings of the 6th international symposium on Memory management
CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Microarchitecture-Independent Workload Characterization

IEEE Micro
An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Efficient operating system scheduling for performance-asymmetric multi-core architectures

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Memory performance attacks: denial of memory service in multi-core systems

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
A table-based method for single-pass cache optimization

Proceedings of the 18th ACM Great Lakes symposium on VLSI
Accurate memory signatures and synthetic address traces for HPC applications

Proceedings of the 22nd annual international conference on Supercomputing
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
3D-Stacked Memory Architectures for Multi-core Processors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Cache modeling in probabilistic execution time analysis

Proceedings of the 45th annual Design Automation Conference
PAM: a novel performance/power aware meta-scheduler for multi-core systems

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Using OS Observations to Improve Performance in Multicore Systems

IEEE Micro
Static analysis for fast and accurate design space exploration of caches

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Serialization sets: a dynamic dependence-based parallel execution model

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Modeling of cache access behavior based on Zipf's law

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Towards practical page coloring-based multicore cache management

Proceedings of the 4th ACM European conference on Computer systems
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A study on optimally co-scheduling jobs of different lengths on chip multiprocessors

Proceedings of the 6th ACM conference on Computing frontiers
Enhancing operating system support for multicore processors by using hardware performance monitoring

ACM SIGOPS Operating Systems Review
FlexDCP: a QoS framework for CMP architectures

ACM SIGOPS Operating Systems Review
Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
The multikernel: a new OS architecture for scalable multicore systems

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
A case for integrated processor-cache partitioning in chip multiprocessors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Enabling software management for multicore caches with a lightweight hardware support

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Evaluation techniques for storage hierarchies

IBM Systems Journal
MCC-DB: minimizing cache conflicts in multi-core processors for databases

Proceedings of the VLDB Endowment
Load balancing on speed

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Resource-conscious scheduling for energy efficiency on multicore processors

Proceedings of the 5th European conference on Computer systems
Locating cache performance bottlenecks using data profiling

Proceedings of the 5th European conference on Computer systems
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Ease of use with concurrent collections (CnC)

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Corey: an operating system for many cores

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Design principles for end-to-end multicore schedulers

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Synchronization via scheduling: techniques for efficiently managing shared state

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
A case for NUMA-aware contention management on multicore systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
ACCESS: Smart scheduling for asymmetric cache CMPs

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
FACT: a framework for adaptive contention-aware thread migrations

Proceedings of the 8th ACM International Conference on Computing Frontiers

A flexible simulation framework for multicore schedulers: work in progress paper

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
An empirical model for predicting cross-core performance interference on multicore processors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip multicore processors (CMPs) have emerged as the dominant architecture choice for modern computing platforms and will most likely continue to be dominant well into the foreseeable future. As with any system, CMPs offer a unique set of challenges. Chief among them is the shared resource contention that results because CMP cores are not independent processors but rather share common resources among cores such as the last level cache (LLC). Shared resource contention can lead to severe and unpredictable performance impact on the threads running on the CMP. Conversely, CMPs offer tremendous opportunities for mulithreaded applications, which can take advantage of simultaneous thread execution as well as fast inter thread data sharing. Many solutions have been proposed to deal with the negative aspects of CMPs and take advantage of the positive. This survey focuses on the subset of these solutions that exclusively make use of OS thread-level scheduling to achieve their goals. These solutions are particularly attractive as they require no changes to hardware and minimal or no changes to the OS. The OS scheduler has expanded well beyond its original role of time-multiplexing threads on a single core into a complex and effective resource manager. This article surveys a multitude of new and exciting work that explores the diverse new roles the OS scheduler can successfully take on.