Spatiotemporal Coherence Tracking
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
ZSim: fast and accurate microarchitectural simulation of thousand-core systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
The locality-aware adaptive cache coherence protocol
Proceedings of the 40th Annual International Symposium on Computer Architecture
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Building expressive, area-efficient coherence directories
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Multi-grain coherence directories
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Using in-flight chains to build a scalable cache coherence protocol
ACM Transactions on Architecture and Code Optimization (TACO)
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
Large-scale CMPs with hundreds of cores require a directory-based protocol to maintain cache coherence. However, previously proposed coherence directories are hard to scale beyond tens of cores, requiring either excessive area or energy, complex hierarchical protocols, or inexact representations of sharer sets that increase coherence traffic and degrade performance. We present SCD, a scalable coherence directory that relies on efficient highly-associative caches (such as zcaches) to implement a single-level directory that scales to thousands of cores, tracks sharer sets exactly, and incurs negligible directory-induced invalidations. SCD scales because, unlike conventional directories, it uses a variable number of directory tags to represent sharer sets: lines with one or few sharers use a single tag, while widely shared lines use additional tags, so tags remain small as the system scales up. We show that, thanks to the efficient highly-associative array it relies on, SCD can be fully characterized using analytical models, and can be sized to guarantee a negligible number of evictions independently of the workload. We evaluate SCD using simulations of a 1024-core CMP. For the same level of coverage, we find that SCD is 13脳 more area-efficient than full-map sparse directories, and 2脳 more area-efficient and faster than hierarchical directories, while requiring a simpler protocol. Furthermore, we show that SCD's analytical models are accurate in practice.