CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms

Authors:
Li Zhao;Ravi Iyer;Ramesh Illikkal;Jaideep Moses;Srihari Makineni;Don Newell
Affiliations:
Intel Corporation, USA;Intel Corporation, USA;Intel Corporation, USA;Intel Corporation, USA;Intel Corporation, USA;Intel Corporation, USA
Venue:
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Year:
2007

Citing 0
Cited 26

Towards practical page coloring-based multicore cache management

Proceedings of the 4th ACM European conference on Computer systems
A light-weight fairness mechanism for chip multiprocessor memory systems

Proceedings of the 6th ACM conference on Computing frontiers
Rate-based QoS techniques for cache/memory in CMP platforms

Proceedings of the 23rd international conference on Supercomputing
Virtual platform architectures for resource metering in datacenters

ACM SIGMETRICS Performance Evaluation Review
VM3: Measuring, modeling and managing VM shared resources

Computer Networks: The International Journal of Computer and Telecommunications Networking
Modeling virtual machine performance: challenges and approaches

ACM SIGMETRICS Performance Evaluation Review
Q-clouds: managing performance interference effects for QoS-aware clouds

Proceedings of the 5th European conference on Computer systems
Synthesizing contention

Proceedings of the Workshop on Binary Instrumentation and Applications
FaReS: Fair Resource Scheduling for VMM-Bypass InfiniBand Devices

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Hardware execution throttling for multi-core resource management

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Supervised learning based power management for multicore processors

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Online cache modeling for commodity multicore processors

ACM SIGOPS Operating Systems Review
Architectural support for thread communications in multi-core processors

Parallel Computing
Directly characterizing cross core interference through contention synthesis

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches

ACM Transactions on Architecture and Code Optimization (TACO)
DeFT: Design space exploration for on-the-fly detection of coherence misses

ACM Transactions on Architecture and Code Optimization (TACO)
Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines

Proceedings of the 2nd ACM Symposium on Cloud Computing
A high performance adaptive miss handling architecture for chip multiprocessors

Transactions on High-Performance Embedded Architectures and Compilers IV
DIEF: an accurate interference feedback mechanism for chip multiprocessor memory systems

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Optimizing datacenter power with memory system levers for guaranteed quality-of-service

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Survey of scheduling techniques for addressing shared resources in multicore processors

ACM Computing Surveys (CSUR)
Measuring interference between live datacenter applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A Machine Learning Based Meta-Scheduler for Multi-Core Processors

International Journal of Adaptive, Resilient and Autonomic Systems
CPI2: CPU performance isolation for shared compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
PCASA: probabilistic control-adjusted selective allocation for shared caches

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Impact of resource sharing on performance and performance prediction: a survey

CONCUR'13 Proceedings of the 24th international conference on Concurrency Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

As multi-core architectures flourish in the marketplace, multi-application workload scenarios (such as server consolidation) are growing rapidly. When running multiple applications simultaneously on a platform, it has been shown that contention for shared platform resources such as last-level cache can severely degrade performance and quality of service (QoS). But today's platforms do not have the capability to monitor shared cache usage accurately and disambiguate its effects on the performance behavior of each individual application. In this paper, we investigate low-overhead mechanisms for fine-grain monitoring of the use of shared cache resources along three vectors: (a) occupancy -- how much space is being used and by whom, (b) interference -- how much contention is present and who is being affected and (c) sharing -- how are threads cooperating. We propose the CacheScouts monitoring architecture consisting of novel tagging (software-guided monitoring IDs), and sampling mechanisms (set sampling) to achieve shared cache monitoring on per application basis at low overhead (\le 0.1%) and with very little loss of accuracy (\le 5%). We also present case studies to show how CacheScouts can be used by operating systems (OS) and virtual machine monitors (VMMs) for (a) characterizing execution profiles, (b) optimizing scheduling for performance management, (c) providing QoS and (d) metering for chargeback.