A cache coherence scheme with fast selective invalidation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Adjustable block size coherent caches
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Life span strategy—a compiler-based approach to cache coherence
ICS '92 Proceedings of the 6th international conference on Supercomputing
Cache coherence using local knowledge
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A version control approach to Cache coherence
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Formalized methodology for data reuse exploration in hierarchical memory mappings
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Power and Speed-Efficient Code Transformation of Video Compression Algorithms for RISC Processors
Journal of VLSI Signal Processing Systems - Special issue on multimedia signal processing
Eliminating Stale Data References through Array Data-Flow Analysis
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Exact Distributed Invalidation
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Towards general and exact distributed invalidation
Journal of Parallel and Distributed Computing
BarrierWatch: characterizing multithreaded workloads across and within program-defined epochs
Proceedings of the 8th ACM International Conference on Computing Frontiers
Hi-index | 0.00 |
In this paper, we introduce a compiler-directed coherence scheme which can exploit most of the temporal and spatial locality across task boundaries. It requires only an extended tag field per cache word, one modified memory access instruction, and a counter called the epoch counter in each processor. By using the epoch counter as a system-wide version number, the scheme simplifies the cache hardware of previous version control [5] or timestamp-based schemes [12], but still exploits most of the temporal and spatial locality across task boundaries. We present a compiler algorithm to generate the appropriate memory access instructions for the proposed scheme. The algorithm is based on a data flow analysis technique. It identifies potential stale references by examining memory reference patterns in a source program.