Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing embedded applications using programmer-inserted hints
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Compiler optimization techniques for OpenMP programs
Scientific Programming
Hi-index | 0.00 |
As the difference in speed between processor and memory system continues to increase, it is becoming crucial to develop and refine techniques that enhance the effectiveness of cache hierarchies. One promising technique in the context of scalable shared-memory multiprocessors is data forwarding. Forwarding hides the latency of communication-induced misses by having producer processors send data to the caches of potential consumer processors in advance. Forwarding can hide the latency effectively, has low instruction overhead, and uses few machine resources.This paper presents a complete implementation of a data forwarding pass in an industrial-strength parallelizing compiler. Complete Fortran applications are analyzed for dependences and, based on the analysis, automatically annotated with forwarding directives. We propose a forwarding framework that includes 4 new instructions: write-forward, write-broadcast, write-update}, and write-through. New micro-architectural support is proposed.In our analysis, we assume that the assignment of loop iterations to processors is known. We perform simulations of multiprocessors with different cache, memory, machine sharing, and process migration parameters. We conclude that data forwarding delivers large speedups (six 32-processor applications ran an average of 40% faster), gets close to the upper bound in performance, and needs compiler support of only medium complexity.