ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Boosting SMT Performance by Speculation Control
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Memory performance attacks: denial of memory service in multi-core systems
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Per-thread cycle accounting in SMT processors
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A light-weight fairness mechanism for chip multiprocessor memory systems
Proceedings of the 6th ACM conference on Computing frontiers
Rate-based QoS techniques for cache/memory in CMP platforms
Proceedings of the 23rd international conference on Supercomputing
Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Application-aware prioritization mechanisms for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Hardware execution throttling for multi-core resource management
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Aérgia: exploiting packet latency slack in on-chip networks
Proceedings of the 37th annual international symposium on Computer architecture
Online cache modeling for commodity multicore processors
ACM SIGOPS Operating Systems Review
LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Memory Latency Reduction via Thread Throttling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs
Journal of Parallel and Distributed Computing
Memory systems in the many-core era: challenges, opportunities, and solution directions
Proceedings of the international symposium on Memory management
Prefetch-aware shared resource management for multi-core systems
Proceedings of the 38th annual international symposium on Computer architecture
The impact of memory subsystem resource sharing on datacenter applications
Proceedings of the 38th annual international symposium on Computer architecture
FACT: a framework for adaptive contention-aware thread migrations
Proceedings of the 8th ACM International Conference on Computing Frontiers
Bandwidth constrained coordinated HW/SW prefetching for multicores
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
REEact: a customizable virtual execution manager for multicore platforms
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Parallel application memory scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing memory interference in multicore systems via application-aware memory channel partitioning
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Computer Systems (TOCS)
Topology-Aware quality-of-service support in highly integrated chip multiprocessors
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Courteous cache sharing: being nice to others in capacity management
Proceedings of the 49th Annual Design Automation Conference
Trace-driven simulation of memory system scheduling in multithread application
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Providing fairness on shared-memory multiprocessors via process scheduling
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Compiling for niceness: mitigating contention for QoS in warehouse scale computers
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
Dynamic QoS management for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Optimizing datacenter power with memory system levers for guaranteed quality-of-service
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
HaLock: hardware-assisted lock contention detection in multithreaded applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Application-to-core mapping policies to reduce memory interference in multi-core systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Measuring interference between live datacenter applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Per-thread cycle accounting in multicore processors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Fair CPU time accounting in CMP+SMT processors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Holistic run-time parallelism management for time and energy efficiency
Proceedings of the 27th international ACM conference on International conference on supercomputing
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers
Proceedings of the 40th Annual International Symposium on Computer Architecture
Writeback-aware bandwidth partitioning for multi-core systems with PCM
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Ubik: efficient cache sharing with strict qos for latency-critical workloads
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
REF: resource elasticity fairness with sharing incentives for multiprocessors
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Cores in a chip-multiprocessor (CMP) system share multiple hardware resources in the memory subsystem. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed separate fairness mechanisms in each individual resource. Such resource-based fairness mechanisms implemented independently in each resource can make contradictory decisions, leading to low fairness and loss of performance. Therefore, a coordinated mechanism that provides fairness in the entire shared memory system is desirable. This paper proposes a new approach that provides fairness in the entire shared memory system, thereby eliminating the need for and complexity of developing fairness mechanisms for each individual resource. Our technique, Fairness via Source Throttling (FST), estimates the unfairness in the entire shared memory system. If the estimated unfairness is above a threshold set by system software, FST throttles down cores causing unfairness by limiting the number of requests they can inject into the system and the frequency at which they do. As such, our source-based fairness control ensures fairness decisions are made in tandem in the entire memory system. FST also enforces thread priorities/weights, and enables system software to enforce different fairness objectives and fairness-performance tradeoffs in the memory system. Our evaluations show that FST provides the best system fairness and performance compared to four systems with no fairness control and with state-of-the-art fairness mechanisms implemented in both shared caches and memory controllers.