An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Analysis of cache invalidation patterns in multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Computer
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Cooperative shared memory: software and hardware for scalable multiprocessor
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A scalable coherent cache system with a dynamic pointing scheme
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
Extending the scalable coherent interface for large-scale shared-memory multiprocessors
Extending the scalable coherent interface for large-scale shared-memory multiprocessors
Mechanisms for cooperative shared memory
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Software-extended coherent shared memory: performance and cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992
IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
Kiloprocessor Extensions to SCI
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
SPLASH: Stanford parallel applications for shared-memory*
SPLASH: Stanford parallel applications for shared-memory*
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
A consistency architecture for hierarchical shared caches
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
On Scalable Synchronization for Distributed Embedded Real-Time Systems
SEUS '08 Proceedings of the 6th IFIP WG 10.2 international workshop on Software Technologies for Embedded and Ubiquitous Systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A two-level directory organization solution for CC-NUMA systems
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
A new hybrid directory scheme for shared memory multi-processors
CSR'06 Proceedings of the First international computer science conference on Theory and Applications
Hi-index | 14.98 |
Directory schemes have long been used to solve the cache coherence problem for large scale shared memory multiprocessors. In addition, tree-based protocols have been employed to reduce the directory size and the invalidation latency for a large degree of data sharing in the system. However, the existing tree-based protocols involve a very high communication overhead for maintaining a balanced tree, especially when the degree of data sharing is low. This paper presents a new tree-based cache coherence protocol which is a hybrid of the limited directory and the linked list schemes. By utilizing a limited number of pointers in the directory, the proposed protocol connects the nodes caching a shared block in a tree fashion without incurring any communication overhead. In addition to the low communication overhead, the proposed scheme also possesses the advantages of the existing bit-map and tree-based linked list protocols, namely, scalable memory requirement and logarithmic invalidation latency. We evaluate the performance of our protocol by running four applications on the Proteus execution-driven simulator. Our simulation results show that the performance of the proposed protocol is very close to that of the full-map protocol.