An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A case for two-way skewed-associative caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Randomized Cache Placement for Eliminating Conflicts
IEEE Transactions on Computers - Special issue on cache memory and related problems
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking
Proceedings of the 32nd annual international symposium on Computer Architecture
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Subspace snooping: filtering snoops with operating system support
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The ZCache: Decoupling Ways and Associativity
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
Proceedings of the 38th annual international symposium on Computer architecture
Cuckoo directory: A scalable directory for many-core systems
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
SCD: A scalable coherence directory with flexible sharer set encoding
HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Building expressive, area-efficient coherence directories
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Multi-grain coherence directories
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneous system coherence for integrated CPU-GPU systems
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Chip-multiprocessors require a coherence directory to track data sharing and order accesses to the shared data. Scaling coherence directories to support a large number of cores is challenging due to excessive area requirements of the directories. The state-of-the-art proposals reduce the directory size by not keeping coherence information for private data. These approaches are useful for workloads that have predominantly private data, but are not applicable to workloads with shared data. We observe that data are not actively shared by multiple cores. In workloads with a shared dataset, although each core accesses the whole data, the chance that multiple cores access the same piece of data at the same time is low. Based on this observation we design a Spatiotemporal Coherence Tracking scheme that drastically reduces the directory size without sacrificing performance. The proposed directory scheme uses dual-grain tracking and switches between the granularities whenever possible to save the area. It dynamically detects spatial regions of data that are privately accessed by one core over a time period and for those regions, increases coherence tracking granularity from block-level to region-level. Our experimental results show that the proposed approach can reduce the baseline sparse directory size by at least 75% across a variety of commercial and scientific workloads, while sacrificing only 1% of performance. Using our approach, the directory can be under-provisioned to have fewer entries than the number of cache blocks that are being tracked.