The locality-aware adaptive cache coherence protocol
Proceedings of the 40th Annual International Symposium on Computer Architecture
Protozoa: adaptive granularity cache coherence
Proceedings of the 40th Annual International Symposium on Computer Architecture
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Building expressive, area-efficient coherence directories
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Multi-grain coherence directories
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
One of the key scalability challenges of on-chip coherence in a multicore chip is the coherence directory, which provides information on sharing of cache blocks. Shadow tags that duplicate entire private cache tag arrays are widely used to minimize area overhead, but require an energy-intensive associative search to obtain the sharing information. Recent research proposed a Tagless directory, which uses bloom filters to summarize the tags in a cache set. The Tagless directory associates the sharing vector with the bloom filter buckets to completely eliminate the associative lookup and reduce the directory overhead. However, Tagless still uses a full map sharing vector to represent the sharing information, resulting in remaining area and energy challenges with increasing core counts. In this paper, we first show that due to the regular nature of applications, many bloom filters essentially replicate the same sharing pattern. We next exploit the pattern commonality and propose SPATL (Sharing-pattern based Tagless Directory). SPATL exploits the sharing pattern commonality to decouple the sharing patterns from the bloom filters and eliminates the redundant copies of sharing patterns. SPATL works with both inclusive and noninclusive shared caches and provides 34% storage savings over Tagless, the previous most storage-efficient directory, at 16 cores. We study multiple strategies to periodically eliminate the false sharing that comes from combining sharing pattern compression with Tagless, and demonstrate that SPATL can achieve the same level of false sharers as Tagless with 5% extra bandwidth. Finally, we demonstrate that SPATL scales even better than an idealized directory and can support 1024-core chips with less than 1% of the private cache space for data parallel applications.