Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Empirical evaluation of the CRAY-T3D: a compiler perspective
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Compiler techniques for data prefetching on the PowerPC
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Techniques for Compiler-Directed Cache Coherence
IEEE Parallel & Distributed Technology: Systems & Technology
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors
Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors
An Integrated Framework for Compiler-Directed Cache Coherence and Data Prefetching
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
Cache coherence enforcement and memory latency reduction and hiding are very important problems in the design of large-scale shared-memory multiprocessors. In this paper, we propose a compiler-directed cache coherence scheme which makes use of data prefetching. The Cache Coherence with Data Prefetching (CCDP) scheme uses compiler analysis techniques to identify potentially-stale data references, which are references to invalid copies of cached data. The key idea of the CCDP scheme is to enforce cache coherence by prefetching the up-to-date data corresponding to these potentially-stale references from the main memory.Application case studies were conducted to gain a quantitative idea of the performance potential of the CCDP scheme on a real system. We applied the CCDP scheme on four benchmark programs from the SPEC CFP95 and CFP92 suites, and executed them on the Cray T3D. The experimental results show that for the programs studied, our scheme provides significant performance improvements by caching shared data and reducing the remote shared-memory access penalty incurred by the programs.