Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies

Authors:
Aamer Jaleel;Eric Borch;Malini Bhandaru;Simon C. Steely Jr.;Joel Emer
Affiliations:
-;-;-;-;-
Venue:
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2010

Citing 17
Cited 18

On the inclusion properties for multi-level cache hierarchies

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
An evaluation of directory schemes for cache coherence

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Cache replacement with dynamic exclusion

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cache coherence directories for scalable multiprocessors

Cache coherence directories for scalable multiprocessors
Tradeoffs in two-level on-chip caching

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Performance evaluation of exclusive cache hierarchies

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Reducing Verification Complexity of a Multicore Coherence Protocol Using Assume/Guarantee

FMCAD '06 Proceedings of the Formal Methods in Computer Aided Design
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture

Bypass and insertion algorithms for exclusive last-level caches

Proceedings of the 38th annual international symposium on Computer architecture
CRUISE: cache replacement and utility-aware scheduling

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
PACMan: prefetch-aware cache management for high performance caching

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Why on-chip cache coherence is here to stay

Communications of the ACM
Defensive loop tiling for multi-core processor

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Locality & utility co-optimization for practical capacity management of shared last level caches

Proceedings of the 26th ACM international conference on Supercomputing
FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion

Proceedings of the 39th Annual International Symposium on Computer Architecture
Reducing last level cache pollution in NUMA multicore systems for improving cache performance

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Exploiting reuse locality on inclusive shared last-level caches

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Improving Cache Management Policies Using Dynamic Reuse Distances

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive cache management for a combined SRAM and DRAM cache hierarchy for multi-cores

Proceedings of the Conference on Design, Automation and Test in Europe
The locality-aware adaptive cache coherence protocol

Proceedings of the 40th Annual International Symposium on Computer Architecture
The case for a scalable coherence protocol for complex on-chip cache hierarchies in many core systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
The reuse cache: downsizing the shared last-level cache

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ARI: Adaptive LLC-memory traffic management

ACM Transactions on Architecture and Code Optimization (TACO)
Temporal-based multilevel correlating inclusive cache replacement

ACM Transactions on Architecture and Code Optimization (TACO)
Virtual machine consolidation based on interference modeling

The Journal of Supercomputing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Inclusive caches are commonly used by processors to simplify cache coherence. However, the trade-off has been lower performance compared to non-inclusive and exclusive caches. Contrary to conventional wisdom, we show that the limited performance of inclusive caches is mostly due to inclusion victims—lines that are evicted from the core caches to satisfy the inclusion property—and not the reduced cache capacity of the hierarchy due to the duplication of data. These inclusion victims are incorrectly chosen for replacement because the last-level cache (LLC) is unaware of the temporal locality of lines in the core caches. We propose Temporal Locality Aware (TLA) cache management policies to allow an inclusive LLC to be aware of the temporal locality of lines in the core caches. We propose three TLA policies: Temporal Locality Hints (TLH), Early Core Invalidation (ECI), and Query Based Selection (QBS). All three policies improve inclusive cache performance without requiring any additional hardware structures. In fact, QBS performs similar to a non-inclusive cache hierarchy.