Efficient run-time support for irregular block-structured applications
Journal of Parallel and Distributed Computing - Special issue on irregular problems in supercomputing applications
Lessons learned from implementing BSP
Future Generation Computer Systems - Special issue on HPCN '97
Run-Time Fusion of MPI Calls in a Parallel C++ Library
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Array Design and Expression Evaluation in POOMA II
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Efficient Interprocedural Data Placement Optimisation in a Parallel Library
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
TreadMarks: distributed shared memory on standard workstations and operating systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Efficient shared-memory support for parallel graph reduction
Future Generation Computer Systems
Hi-index | 0.00 |
CFL (Communication Fusion Library) is an experimental C++ library which supports shared reduction variables in MPI programs. It uses overloading to distinguish private variables from replicated, shared variables, and automatically introduces MPI communication to keep replicated data consistent. This paper concerns a simple but surprisingly effective technique which improves performance substantially: CFL operators are executed lazily in order to expose opportunities for run-time, context-dependent, optimisation such as message aggregation and operator fusion. We evaluate the idea using both toy benchmarks and a 'production' code for simulating plankton population dynamics in the upper ocean. The results demonstrate the library's software engineering benefits, and show that performance close to that of manually optimised code can be achieved automatically in many cases.