Vantage: scalable and efficient fine-grain cache partitioning

Authors:
Daniel Sanchez;Christos Kozyrakis
Affiliations:
Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA
Venue:
Proceedings of the 38th annual international symposium on Computer architecture
Year:
2011

Citing 18
Cited 13

A case for two-way skewed-associative caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Application-specific memory management for embedded systems using software-controlled caches

Proceedings of the 37th Annual Design Automation Conference
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
Universal classes of hash functions (Extended Abstract)

STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
From chaos to QoS: case studies in CMP resource management

ACM SIGARCH Computer Architecture News
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
ECMon: exposing cache events for monitoring

Proceedings of the 36th annual international symposium on Computer architecture
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
The ZCache: Decoupling Ways and Associativity

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Probabilistic shared cache management (PriSM)

Proceedings of the 39th Annual International Symposium on Computer Architecture
Improving Cache Management Policies Using Dynamic Reuse Distances

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
CPI2: CPU performance isolation for shared compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

Proceedings of the 40th Annual International Symposium on Computer Architecture
ZSim: fast and accurate microarchitectural simulation of thousand-core systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
Tessellation: refactoring the OS around explicit resource containers with continuous adaptation

Proceedings of the 50th Annual Design Automation Conference
Jigsaw: scalable software-defined caches

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Coloring the cloud for predictable performance

Proceedings of the 4th annual Symposium on Cloud Computing
SHIFT: shared history instruction fetch for lean-core server processors

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient management of last-level caches in graphics processors for 3D scene rendering workloads

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Quasar: resource-efficient and QoS-aware cluster management

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Ubik: efficient cache sharing with strict qos for latency-critical workloads

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
QoS-Aware scheduling in heterogeneous datacenters with paragon

ACM Transactions on Computer Systems (TOCS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cache partitioning has a wide range of uses in CMPs, from guaranteeing quality of service and controlled sharing to security-related techniques. However, existing cache partitioning schemes (such as way-partitioning) are limited to coarse-grain allocations, can only support few partitions, and reduce cache associativity, hurting performance. Hence, these techniques can only be applied to CMPs with 2-4 cores, but fail to scale to tens of cores. We present Vantage, a novel cache partitioning technique that overcomes the limitations of existing schemes: caches can have tens of partitions with sizes specified at cache line granularity, while maintaining high associativity and strong isolation among partitions. Vantage leverages cache arrays with good hashing and associativity, which enable soft-pinning a large portion of cache lines. It enforces capacity allocations by controlling the replacement process. Unlike prior schemes, Vantage provides strict isolation guarantees by partitioning most (e.g. 90%) of the cache instead of all of it. Vantage is derived from analytical models, which allow us to provide strong guarantees and bounds on associativity and sizing independent of the number of partitions and their behaviors. It is simple to implement, requiring around 1.5% state overhead and simple changes to the cache controller. We evaluate Vantage using extensive simulations. On a 32-core system, using 350 multiprogrammed workloads and one partition per core, partitioning the last-level cache with conventional techniques degrades throughput for 71% of the workloads versus an unpartitioned cache (by 7% average, 25% maximum degradation), even when using 64-way caches. In contrast, Vantage improves throughput for 98% of the workloads, by 8% on average (up to 20%), using a 4-way cache.