A Framework for Providing Quality of Service in Chip Multi-Processors

Authors:
Fei Guo;Yan Solihin;Li Zhao;Ravishankar Iyer
Affiliations:
-;-;-;-
Venue:
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2007

Citing 0
Cited 36

Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A dynamic scheduler for balancing HPC applications

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A light-weight fairness mechanism for chip multiprocessor memory systems

Proceedings of the 6th ACM conference on Computing frontiers
FlexDCP: a QoS framework for CMP architectures

ACM SIGOPS Operating Systems Review
Rate-based QoS techniques for cache/memory in CMP platforms

Proceedings of the 23rd international conference on Supercomputing
Service level agreement for multithreaded processors

ACM Transactions on Architecture and Code Optimization (TACO)
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
Adapting application execution in CMPs using helper threads

Journal of Parallel and Distributed Computing
SHARP control: controlled shared cache management in chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

ACM Transactions on Architecture and Code Optimization (TACO)
Two-phase trace-driven simulation (TPTS): a fast multicore processor architecture simulation approach

Software—Practice & Experience
On mitigating memory bandwidth contention through bandwidth-aware scheduling

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Quality of service shared cache management in chip multiprocessor architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Replacement policies for shared caches on symmetric multicores: a programmer-centric point of view

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs

Journal of Parallel and Distributed Computing
Modeling program resource demand using inherent program characteristics

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Predictive coordination of multiple on-chip resources for chip multiprocessors

Proceedings of the international conference on Supercomputing
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
Modeling program resource demand using inherent program characteristics

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
A helper thread based dynamic cache partitioning scheme for multithreaded applications

Proceedings of the 48th Design Automation Conference
Multilayer cache partitioning for multiprogram workloads

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
The gradient-based cache partitioning algorithm

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Improving shared cache behavior of multithreaded object-oriented applications in multicores

Proceedings of the International Conference on Computer-Aided Design
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Reliability-aware core partitioning in chip multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Compiling for niceness: mitigating contention for QoS in warehouse scale computers

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Dynamic QoS management for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Globally Synchronized Frames for guaranteed quality-of-service in on-chip networks

Journal of Parallel and Distributed Computing
Optimizing datacenter power with memory system levers for guaranteed quality-of-service

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Per-thread cycle accounting in multicore processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Fair CPU time accounting in CMP+SMT processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

Proceedings of the 40th Annual International Symposium on Computer Architecture
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers

Proceedings of the 40th Annual International Symposium on Computer Architecture
Ubik: efficient cache sharing with strict qos for latency-critical workloads

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
The sharing architecture: sub-core configurability for IaaS clouds

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The trends in enterprise IT toward service-oriented com- puting, server consolidation, and virtual computing point to a future in which workloads are becoming increasingly di- verse in terms of performance, reliability, and availability requirements. It can be expected that more and more appli- cations with diverse requirements will run on a CMP and share platform resources such as the lowest level cache and off-chip bandwidth. In this environment, it is desirable to have microarchitecture and software support that can pro- vide a guarantee of a certain level of performance, which we refer to as performance Quality of Service. In this paper, we investigate a framework that would be needed for a CMP to fully provide QoS. We found that the ability of a CMP to partition platform resources alone is not sufficient for fully providing QoS. We also need an ap- propriate way to specify a QoS target, and an admission control policy that accepts jobs only when their QoS targets can be satisfied. We also found that providing strict QoS often leads to a significant reduction in throughput due to resource fragmentation. We propose novel throughput op- timization techniques that include: (1) exploiting various QoS execution modes, and (2) a microarchitecture tech- nique that steals excess resources from a job while still meeting its QoS target. We evaluated our QoS framework with a full system simulation of a 4-core CMP and a re- cent version of the Linux Operating System. We found that compared to an unoptimized scheme, the throughput can be improved by up to 47%, making the throughput significantly closer to a non-QoS CMP.