Quality of service shared cache management in chip multiprocessor architecture

Authors:
Fei Guo;Yan Solihin;Li Zhao;Ravishankar Iyer
Affiliations:
VMware, Inc., Sunnyvale, CA;North Carolina State University, Raleigh, NC;Intel Corporation, Hillsboro, OR;Intel Corporation, Hillsboro, OR
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2010

Citing 24
Cited 1

Understanding some simple processor-performance limits

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Development and validation of a hierarchical memory model incorporating CPU- and memory-operation overlap model

Proceedings of the 1st international workshop on Software and performance
Analytical cache models with applications to cache partitioning

ICS '01 Proceedings of the 15th international conference on Supercomputing
Elastic Scheduling for Flexible Workload Management

IEEE Transactions on Computers
Simics: A Full System Simulation Platform

Computer
A resource allocation model for QoS management

RTSS '97 Proceedings of the 18th IEEE Real-Time Systems Symposium
Introduction: Service-oriented computing

Communications of the ACM - Service-oriented computing
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Enterprise IT Trends and Implications for Architecture Research

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Fast and fair: data-stream quality of service

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
METERG: Measurement-Based End-to-End Performance Estimation Technique in QoS-Capable Multiprocessors

RTAS '06 Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Integrating Hard/Soft Real-Time Tasks and Best-Effort Jobs on Multiprocessors

ECRTS '07 Proceedings of the 19th Euromicro Conference on Real-Time Systems
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
A Framework for Providing Quality of Service in Chip Multi-Processors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture

Imbalanced cache partitioning for balanced data-parallel programs

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The trends in enterprise IT toward service-oriented computing, server consolidation, and virtual computing point to a future in which workloads are becoming increasingly diverse in terms of performance, reliability, and availability requirements. It can be expected that more and more applications with diverse requirements will run on a Chip Multi-Processor (CMP) and share platform resources such as the lowest level cache and off-chip bandwidth. In this environment, it is desirable to have microarchitecture and software support that can provide a guarantee of a certain level of performance, which we refer to as performance Quality of Service. In this article, we investigated a framework would be needed to manage the shared cache resource for fully providing QoS in a CMP. We found in order to fully provide QoS, we need to specify an appropriate QoS target for each job and apply an admission control policy to accept jobs only when their QoS targets can be satisfied. We also found that providing strict QoS often leads to a significant reduction in throughput due to resource fragmentation. We proposed throughput optimization techniques that include: (1) exploiting various QoS execution modes, and (2) a microarchitecture technique, which we refer to as resource stealing, that detects and reallocates excess cache capacity from a job while preserving its QoS target. We designed and evaluated three algorithms for performing resource stealing, which differ in how aggressive they are in stealing excess cache capacity, and in the degree of confidence in meeting QoS targets. In addition, we proposed a mechanism to dynamically enable or disable resource stealing depending on whether other jobs can benefit from additional cache capacity. We evaluated our QoS framework with a full system simulation of a 4-core CMP and a recent version of the Linux Operating System. We found that compared to an unoptimized scheme, the throughput can be improved by up to 47%, making the throughput significantly closer to a non-QoS CMP.