Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems

Authors:
Eiman Ebrahimi;Chang Joo Lee;Onur Mutlu;Yale N. Patt
Affiliations:
The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;Carnegie Mellon University , Pittsburgh, PA, USA;The University of Texas at Austin, Austin, TX, USA
Venue:
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Year:
2010

Citing 25
Cited 34

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Boosting SMT Performance by Speculation Control

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
QoS for High-Performance SMT Processors in Embedded Systems

IEEE Micro
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Fairness and Throughput in Switch on Event Multithreading

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Memory performance attacks: denial of memory service in multi-core systems

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Per-thread cycle accounting in SMT processors

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A light-weight fairness mechanism for chip multiprocessor memory systems

Proceedings of the 6th ACM conference on Computing frontiers
Rate-based QoS techniques for cache/memory in CMP platforms

Proceedings of the 23rd international conference on Supercomputing
Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Application-aware prioritization mechanisms for on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Hardware execution throttling for multi-core resource management

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference

Aérgia: exploiting packet latency slack in on-chip networks

Proceedings of the 37th annual international symposium on Computer architecture
Online cache modeling for commodity multicore processors

ACM SIGOPS Operating Systems Review
LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Memory Latency Reduction via Thread Throttling

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs

Journal of Parallel and Distributed Computing
Memory systems in the many-core era: challenges, opportunities, and solution directions

Proceedings of the international symposium on Memory management
Prefetch-aware shared resource management for multi-core systems

Proceedings of the 38th annual international symposium on Computer architecture
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
FACT: a framework for adaptive contention-aware thread migrations

Proceedings of the 8th ACM International Conference on Computing Frontiers
Bandwidth constrained coordinated HW/SW prefetching for multicores

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
REEact: a customizable virtual execution manager for multicore platforms

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Parallel application memory scheduling

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing memory interference in multicore systems via application-aware memory channel partitioning

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems

ACM Transactions on Computer Systems (TOCS)
Topology-Aware quality-of-service support in highly integrated chip multiprocessors

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Courteous cache sharing: being nice to others in capacity management

Proceedings of the 49th Annual Design Automation Conference
Trace-driven simulation of memory system scheduling in multithread application

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Providing fairness on shared-memory multiprocessors via process scheduling

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Compiling for niceness: mitigating contention for QoS in warehouse scale computers

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
Dynamic QoS management for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Optimizing datacenter power with memory system levers for guaranteed quality-of-service

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
HaLock: hardware-assisted lock contention detection in multithreaded applications

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Application-to-core mapping policies to reduce memory interference in multi-core systems

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Measuring interference between live datacenter applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Per-thread cycle accounting in multicore processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Fair CPU time accounting in CMP+SMT processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Holistic run-time parallelism management for time and energy efficiency

Proceedings of the 27th international ACM conference on International conference on supercomputing
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers

Proceedings of the 40th Annual International Symposium on Computer Architecture
Writeback-aware bandwidth partitioning for multi-core systems with PCM

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Ubik: efficient cache sharing with strict qos for latency-critical workloads

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
REF: resource elasticity fairness with sharing incentives for multiprocessors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cores in a chip-multiprocessor (CMP) system share multiple hardware resources in the memory subsystem. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed separate fairness mechanisms in each individual resource. Such resource-based fairness mechanisms implemented independently in each resource can make contradictory decisions, leading to low fairness and loss of performance. Therefore, a coordinated mechanism that provides fairness in the entire shared memory system is desirable. This paper proposes a new approach that provides fairness in the entire shared memory system, thereby eliminating the need for and complexity of developing fairness mechanisms for each individual resource. Our technique, Fairness via Source Throttling (FST), estimates the unfairness in the entire shared memory system. If the estimated unfairness is above a threshold set by system software, FST throttles down cores causing unfairness by limiting the number of requests they can inject into the system and the frequency at which they do. As such, our source-based fairness control ensures fairness decisions are made in tandem in the entire memory system. FST also enforces thread priorities/weights, and enables system software to enforce different fairness objectives and fairness-performance tradeoffs in the memory system. Our evaluations show that FST provides the best system fairness and performance compared to four systems with no fairness control and with state-of-the-art fairness mechanisms implemented in both shared caches and memory controllers.