Efficient run-time support for irregular block-structured applications
Journal of Parallel and Distributed Computing - Special issue on irregular problems in supercomputing applications
A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Runtime Interprocedural Data Placement Optimisation for Lazy Parallel Libraries (Extended Abstract)
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Efficient Interprocedural Data Placement Optimisation in a Parallel Library
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Optimising Shared Reduction Variables in MPI Programs
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Hi-index | 0.00 |
CFL (Communication Fusion Library) is a C++ library for MPI programmers. It uses overloading to distinguish private variables from replicated, shared variables, and automatically introduces MPI communication to keep such replicated data consistent. This paper concerns a simple but surprisingly effective technique which improves performance substantially: CFL operators are executed lazily in order to expose opportunities for run-time, context-dependent, optimisation such as message aggregation and operator fusion. We evaluate the idea in the context of a large-scale simulation of oceanic plankton ecology. The results demonstrate the software engineering benefits that accrue from the CFL abstraction and show that performance close to that of manually optimised code can be achieved automatically in many cases.