NCID: a non-inclusive cache, inclusive directory architecture for flexible and efficient cache hierarchies

Authors:
Li Zhao;Ravi Iyer;Srihari Makineni;Don Newell;Liqun Cheng
Affiliations:
Intel, Hillsboro, USA;Intel, Hillsboro, USA;Intel, Hillsboro, OR, USA;Intel, Hillsboro, USA;Intel, Hillsboro, USA
Venue:
Proceedings of the 7th ACM international conference on Computing frontiers
Year:
2010

Citing 19
Cited 1

On the inclusion properties for multi-level cache hierarchies

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Characteristics of performance-optimal multi-level cache hierarchies

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Tradeoffs in two-level on-chip caching

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Interconnect design considerations for large NUCA caches

Proceedings of the 34th annual international symposium on Computer architecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exploring Large-Scale CMP Architectures Using ManySim

IEEE Micro
Virtual Hierarchies

IEEE Micro
Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches

IEEE Micro

The reuse cache: downsizing the shared last-level cache

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip-multiprocessor (CMP) architectures employ multi-level cache hierarchies with private L2 caches per core and a shared L3 cache like Intel's Nehalem processor and AMD's Barcelona processor. When designing a multi-level cache hierarchy, one of the key design choices is the inclusion policy: inclusive, non-inclusive or exclusive. Either choice has its benefits and drawbacks. An inclusive cache hierarchy (like Nehalem's L3) has the benefit of allowing incoming snoops to be filtered at the L3 cache, but suffers from (a) reduced space efficiency due to replication between the L2 and L3 caches and (b) reduced flexibility since it cannot bypass the L3 cache for transient or low priority data. In an inclusive L2/L3 cache hierarchy, it also becomes difficult to flexibly chop L3 cache size (or increase L2 cache size) for different product instantiations because the inclusion can start to affect performance (due to significant back-invalidates). In this paper, we present a novel approach to addressing the drawbacks of inclusive caches, while retaining its positive features of snoop filtering. We present NCID: a non-inclusive cache, inclusive directory architecture that allows data in the L3 to be non-inclusive or exclusive, but retains tag inclusion in the directory to support complete snoop filtering. We then describe and evaluate a range of NCID-based architecture options and policies. Our evaluation shows that NCID enables a flexible and efficient cache hierarchy for future CMP platforms and has the potential to improve performance significantly for several important server benchmarks.