A cache coherence scheme with fast selective invalidation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Comparison of hardware and software cache coherence schemes
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Adjustable block size coherent caches
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Life span strategy—a compiler-based approach to cache coherence
ICS '92 Proceedings of the 6th international conference on Supercomputing
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
The cedar system and an initial performance study
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Limitations of cache prefetching on a bus-based multiprocessor
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache coherence using local knowledge
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Execution-driven tools for parallel simulation of parallel architectures and applications
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
CAT—caching address tags: a technique for reducing area cost of on-chip caches
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A compiler-directed distributed shared memory system
ICS '95 Proceedings of the 9th international conference on Supercomputing
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A version control approach to Cache coherence
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Issues related to MIMD shared-memory computers: the NYU ultracomputer approach
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
A compiler-directed cache coherence scheme with improved intertask locality
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps
IEEE Transactions on Parallel and Distributed Systems
Eliminating Stale Data References through Array Data-Flow Analysis
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
An economical solution to the cache coherence problem
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Compiler analysis for cache coherence: Interprocedural array data-flow analysis and its impacts on cache performance
(R) Program Analysis for Cache Coherence: Beyond Procedural Boundaries
ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors
Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors
International Journal of Parallel Programming
Hi-index | 0.00 |
In this paper, we study a hardware-supported, compiler-directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessors, such as the Cray T3D. The scheme can be adapted to various cache organizations, including multiword cache lines and byte-addressable architectures. Several system related issues, including critical sections, interthread communication, and task migration have also been addressed. The cost of the required hardware support is minimal and proportional to the cache size. The necessary compiler algorithms, including intra- and interprocedural array data flow analysis, have been implemented on the Polaris parallelizing compiler [34]. From our simulation study using the Perfect Club benchmarks [5], we found that in spite of the conservative analysis made by the compiler, for four of six benchmark programs tested, the proposed HSCD scheme outperforms the full-map hardware directory scheme up to 70 percent while the hardware scheme outperforms the HSCD scheme in the remaining two applications up to 89 percent. Given its comparable performance and reduced hardware cost, the proposed scheme can be a viable alternative for large-scale multiprocessors such as the Cray T3D, which rely on users to maintain data coherence.