Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Guide to parallel programming on Sequent computer systems: 2nd edition
Guide to parallel programming on Sequent computer systems: 2nd edition
Cache coherence in systems with parallel communication channels & many processors
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
ICS '90 Proceedings of the 4th international conference on Supercomputing
An efficient caching support for critical sections in large-scale shared-memory multiprocessors
ICS '90 Proceedings of the 4th international conference on Supercomputing
The design and development of a very high speed system bus—the encore Mutlimax nanobus
ACM '86 Proceedings of 1986 ACM Fall joint computer conference
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Solving Linear Systems on Vector and Shared Memory Computers
Solving Linear Systems on Vector and Shared Memory Computers
Structure of Computers and Computations
Structure of Computers and Computations
Cache coherence using local knowledge
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Exploiting cache affinity in software cache coherence
ICS '94 Proceedings of the 8th international conference on Supercomputing
Eliminating Stale Data References through Array Data-Flow Analysis
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Exact Distributed Invalidation
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Towards general and exact distributed invalidation
Journal of Parallel and Distributed Computing
Hi-index | 0.02 |
Access latency in large-scale shared-memory multiprocessors is a concern since most (if not all) memory is one or more hops away through an interconnection network. Providing processors with one or more levels of cache is an accepted way to reduce the average access latency; however, in a multiprocessor, cached values must be kept coherent for the multiprocessor to support the abstraction of a shared global memory. There is no generally accepted hardware solution to provde cache coherence for large-scale shared-memory multiprocessors. Software coherence strategies offer scalability with current hardware. In this paper we examine a compiler-based software strategy for maintaining cache coherence that relies on dependence analysis and a vectorization algorithm to insert cache control directives. Experiments on the BBN TC2000 for a pair of numerical problems show that the run-time cost of coherence using our strategy is less than that for previously proposed compiler-based software methods and suggest that it should compare favorably with proposed hardware schemes.