Hardware transactional memory for GPU architectures
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Why on-chip cache coherence is here to stay
Communications of the ACM
Complexity-effective multicore coherence
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
ACM Transactions on Architecture and Code Optimization (TACO)
Spatiotemporal Coherence Tracking
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Exploring memory consistency for massively-threaded throughput-oriented processors
Proceedings of the 40th Annual International Symposium on Computer Architecture
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Multi-grain coherence directories
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Using in-flight chains to build a scalable cache coherence protocol
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.02 |
Growing core counts have highlighted the need for scalable on-chip coherence mechanisms. The increase in the number of on-chip cores exposes the energy and area costs of scaling the directories. Duplicate-tag-based directories require highly associative structures that grow with core count, precluding scalability due to prohibitive power consumption. Sparse directories overcome the power barrier by reducing directory associativity, but require storage area over-provisioning to avoid high invalidation rates. We propose the Cuckoo directory, a power- and area-efficient scalable distributed directory. The cuckoo directory scales to high core counts without the energy costs of wide associative lookup and without gross capacity over-provisioning. Simulation of a 16-core CMP with commercial server and scientific workloads shows that the Cuckoo directory eliminates invalidations while being up to four times more power-efficient than the Duplicate-tag directory and 24% more power-efficient and up to seven times more area-efficient than the Sparse directory organization. Analytical projections indicate that the Cuckoo directory retains its energy and area benefits with increasing core count, efficiently scaling to at least 1024 cores.