Memory storage patterns in parallel processing
Memory storage patterns in parallel processing
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Runtime compilation techniques for data partitioning and communication schedule reuse
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
Interprocedural compilation of Fortran D
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Compiling affine nested loops: how to optimize the residual communications after the alignment phase
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
On programming of arithmetic operations
Communications of the ACM
Data Distribution at Run-Time: Re-using Execution Plans
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Performance Driven Programming Models
MPPM '97 Proceedings of the Conference on Massively Parallel Programming Models
High-Level Management of Communication Schedules in HPF-Like Languages
High-Level Management of Communication Schedules in HPF-Like Languages
An efficient algorithm for exploiting multiple arithmetic units
IBM Journal of Research and Development
A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Run-Time Fusion of MPI Calls in a Parallel C++ Library
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Is Morton Layout Competitive for Large Two-Dimensional Arrays?
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Optimising Shared Reduction Variables in MPI Programs
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Explicit Dependence Metadata in an Active Visual Effects Library
Languages and Compilers for Parallel Computing
DESOLA: An active linear algebra library using delayed evaluation and runtime code generation
Science of Computer Programming
A domain-specific interpreter for parallelizing a large mixed-language visualisation application
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
This paper describes a combination of methods which make interprocedural data placement optimisation available to parallel libraries. We propose a delayed-evaluation, self-optimising (DESO) numerical library for a distributed-memory multicomputer. Delayed evaluation allows us to capture the control-flow of a user program from within the library at runtime, and to construct an optimised execution plan by propagating data placement constraints backwards through the DAG representing the computation to be performed. Our strategy for optimising data placements at runtime consists of an efficient representation for data distributions, a greedy optimisation algorithm, which because of delayed evaluation can take account of the full context of operations, and of re-using the results of previous runtime optimisations on contexts we have encountered before. We show performance figures for our library on a cluster of Pentium II Linux workstations, which demonstrate that the overhead of our delayed evaluation method is very small, and which show both the parallel speedup we obtain and the benefit of the optimisations we describe.