Execution-driven tools for parallel simulation of parallel architectures and applications
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Improving locality and parallelism in nested loops
Improving locality and parallelism in nested loops
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
Compiler techniques for data prefetching on the PowerPC
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Maintaining cache coherence through compiler-directed data prefetching
Journal of Parallel and Distributed Computing
Techniques for Compiler-Directed Cache Coherence
IEEE Parallel & Distributed Technology: Systems & Technology
A Compiler-Directed Cache Coherence Scheme Using Data Prefetching
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors
Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors
Hi-index | 0.00 |
Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed shared-memory (DSM) multiprocessors. We propose an integrated approach to solve these problems through a compiler-directed cache coherence scheme called the Cache Coherence with Data Prefetching (CCDP) scheme. The CCDP scheme enforces cache coherence by prefetching the potentially-stale references in a parallel program. It also prefetches the non-stale references to hide their memory latencies. To optimize the performance of the CCDP scheme, some prefetch hard-ware support is provided to efficiently handle these two forms of data prefetching operations. We also developed the compiler techniques utilized by the CCDP scheme for stale reference detection, prefetch target analysis and prefetch scheduling. We evaluated the performance of the CCDP scheme via execution-driven simulations of several applications from the SPEC CFP95 and CFP92 benchmark suites. The simulation results show that the CCDP scheme provides significant performance improvements for the benchmark programs studied.