An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Evaluating the performance of software cache coherence
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis of cache invalidation patterns in multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Hector: A Hierarchically Structured Shared-Memory Multiprocessor
Computer - Special issue on experimental research in computer architecture
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Coarse-grain parallel programming in Jade
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Comparison of hardware and software cache coherence schemes
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Distributed shared memory with versioned objects
OOPSLA '92 conference proceedings on Object-oriented programming systems, languages, and applications
Cooperative shared memory: software and hardware for scalable multiprocessor
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The shared regions approach to software cache coherence on multiprocessors
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance evaluation of hybrid hardware and software distributed shared memory protocols
ICS '94 Proceedings of the 8th international conference on Supercomputing
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
A Web Proxy Cache Coherency and Replacement Approach
WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Performance Analysis of Database Systems
Performance Evaluation: Origins and Directions
Hi-index | 0.00 |
Dynamic software cache coherence strategies use information about program sharing behaviour to manage caches at run-time and at a granularity defined by the application. The program-level information is obtained through annotations placed into the application by the user or the compiler. The coherence protocols may range from simple static algorithms to dynamic algorithms that use run-time data structures similar to the directories used in hardware strategies. In this paper, we present an analytic study of five dynamic software cache coherence algorithms and compare these to a representative hardware coherence strategy. The analytic model is constructed using four input parameters --- write probability, locality, granularity, and system size --- and solved by analysis of a Markov chain. We show that the fundamental tradeoffs between the different hardware and software strategies are captured in this model. The results of the study show that hardware schemes perform better for fine-grained data structures for much of the parameter space that we study. However, for coarse-grained data structures, various software algorithms are dominant over most of the parameter space. Further, hardware strategies are found to be more susceptible to the effects of contention, and also perform worse for the asymmetric workload that we study.