Coherence Controller Architectures for Scalable Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Hardware spatial forwarding for widely shared data
Proceedings of the 14th international conference on Supercomputing
A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Adaptive Proxies: Handling Widely-Shared Data in Shared-Memory Multiprocessors (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Proximity-aware directory-based coherence for multi-core processor architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An Efficient Lightweight Shared Cache Design for Chip Multiprocessors
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
A Novel Cache Organization for Tiled Chip Multiprocessor
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
A Methodology to Characterize Critical Section Bottlenecks in DSM Multiprocessors
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
High-throughput coherence control and hardware messaging in everest
IBM Journal of Research and Development
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Hi-index | 0.00 |
Recent research shows that the occupancy of the coherence controllers is a major performance bottleneck for distributed cache coherent shared memory multiprocessors. A significant part of the occupancy is due to the latency of accessing the directory, which is usually kept in DRAM memory. Most coherence controller designs that use protocol processors for executing the coherence protocol handlers use the data cache of the protocol processor for caching directory entries along with protocol handler data. Analogously, a fast Directory Cache (DC) can also be used by the hardwired coherence controller designs in order to minimize directory access time. However, the existing hardwired controllers do not use a directory cache. Moreover, the performance impact of caching directory entries has not been studied in the literature before.This paper studies the performance of directory caches using parallel applications from the SPLASH-2 suite. We demonstrate that using a directory cache can result in 40% or more improvement in the execution time of applications that are communication intensive. We also investigate in detail the various directory cache design parameters: cache size, cache line size, and associativity. Our experimental results show that the directory cache size requirements grow sub-linearly with the increase in the application's data set size. The results also show the performance advantage of multi-entry directory cache lines, as a result of spatial locality and the absence of sharing of directories. The impact of the associativity of the directory caches on performance is less than that of the size and the line size.Also, we find a clear linear relation between the directory cache miss ratio and the coherence controller occupancy, and between both measures and the execution time of the applications, which can help system architects evaluate the impact of directory cache (or coherence controller) designs on overall system performance.