Online cache modeling for commodity multicore processors

Authors:
Richard West;Puneet Zaroo;Carl A. Waldspurger;Xiao Zhang
Affiliations:
Boston University, Boston, MA, USA;VMware, Inc., Palo Alto, CA, USA;VMware, Inc., Palo Alto, CA, USA;University of Rochester, Rochester, NY, USA
Venue:
ACM SIGOPS Operating Systems Review
Year:
2010

Citing 23
Cited 3

Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
Analytical cache models with applications to cache partitioning

ICS '01 Proceedings of the 15th international conference on Supercomputing
OS-Controlled Cache Predictability for Real-Time Systems

RTAS '97 Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97)
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Cache-Aware Real-Time Scheduling on Multicore Platforms: Heuristics and a Case Study

ECRTS '08 Proceedings of the 2008 Euromicro Conference on Real-Time Systems
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Evaluation techniques for storage hierarchies

IBM Systems Journal
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Hardware execution throttling for multi-core resource management

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
A cache-partitioning aware replacement policy for chip multiprocessors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing

Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines

Proceedings of the 2nd ACM Symposium on Cloud Computing
CPI2: CPU performance isolation for shared compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
Improving execution unit occupancy on SMT-based processors through hardware-aware thread scheduling

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern chip-level multiprocessors (CMPs) contain multiple processor cores sharing a common last-level cache, memory interconnects, and other hardware resources. Workloads running on separate cores compete for these resources, often resulting in highlyvariable performance. It is generally desirable to co-schedule workloads that have minimal resource contention, in order to improve both performance and fairness. Unfortunately, commodity processors expose only limited information about the state of shared resources such as caches to the software responsible for scheduling workloads that execute concurrently. To make informed resourcemanagement decisions, it is important to obtain accurate measurements of per-workload cache occupancies and their impact on performance, often summarized by utility functions such as miss-ratio curves (MRCs) In this paper, we first introduce an efficient online technique for estimating the cache occupancy of individual software threads using only commonly-available hardware performance counters. We derive an analytical model as the basis of our occupancy estimation, and extend it for improved accuracy on modern cache configurations, considering the impact of set-associativity, line replacement policy, and memory locality effects. We demonstrate the effectiveness of occupancy estimation with a series of CMP simulations in which SPEC benchmarks execute concurrently on multiple cores. Leveraging our occupancy estimation technique, we also introduce a lightweight approach for online MRC construction, and demonstrate its effectiveness using a prototype implementation in the VMware ESX Server hypervisor. We present a series of experiments involving SPEC benchmarks, comparing the MRCs we construct online with MRCs generated offline in which various cache sizes are enforced via static page coloring