Memory access buffering in multiprocessors
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A cache coherence scheme with fast selective invalidation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Evaluating the performance of software cache coherence
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis of cache invalidation patterns in multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
A cache consistency protocol for multiprocessors with multistage networks
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Structured dataflow analysis for arrays and its use in an optimizing complier
Software—Practice & Experience
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Combining hardware and software cache coherence strategies
ICS '91 Proceedings of the 5th international conference on Supercomputing
A software coherence scheme with the assistance of directories
ICS '91 Proceedings of the 5th international conference on Supercomputing
Comparison and analysis of software and directory coherence schemes
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Array privatization for parallel execution of loops
ICS '92 Proceedings of the 6th international conference on Supercomputing
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
A version control approach to Cache coherence
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Software Assistance for Directory-Based Caches
Proceedings of the 8th International Symposium on Parallel Processing
An economical solution to the cache coherence problem
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Incremental dependence analysis
Incremental dependence analysis
Exact Distributed Invalidation
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Hi-index | 0.00 |
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache coherence in large-scale shared-memory multiprocessors, but both of these approaches have significant limitations. We examine the potential performance improvement of a new software-hardware controlled cache coherence mechanism. This approach augments the run-time information available to a directory-based coherence mechanism with compile-time analysis that statically identifies write references that cannot cause coherence problems and writes that should be written through to memory. These references are marked as not needing to send invalidation messages to thereby reduce the network traffic produced by the directory while maintaining cache consistency. For those memory references that are ambiguous, due to conditional branches, or due to the need for complex data flow analysis, for instance, the compiler conservatively marks the references and relies on the hardware directory to ensure coherence. Trace-driven simulations are used to emulate the compile-time analysis on memory traces and to estimate potential performance improvement that could be expected from a compiler performing this optimization on the Perfect Club benchmark programs. By reducing the number of invalidations, this optimized directory scheme is capable of reducing the processor-memory network traffic by up to 54 percent compared to an unoptimized directory mechanism. In addition, the overall miss ratio can be reduced up to 42 percent due to a corresponding reduction in the number of write misses.