A lazy, self-optimising parallel matrix library

Authors:
Simon Govier;Paul H J Kelly
Affiliations:
Department of Computing, Imperial College, London, UK;Department of Computing, Imperial College, London, UK
Venue:
FP'95 Proceedings of the 1995 international conference on Functional Programming
Year:
1995

Citing 13
Cited 0

Data optimization: allocation of arrays to reduce communication on SIMD machines

Journal of Parallel and Distributed Computing - Massively parallel computation
Parallel programming with coordination structures

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A configuration approach to parallel programming

PARLE '91 Proceedings on Parallel architectures and languages Europe : volume II: parallel languages: volume II: parallel languages
Algorithmic skeletons: structured management of parallel computation

Algorithmic skeletons: structured management of parallel computation
BI-CGSTAB: a fast and smoothly converging variant of BI-CG for the solution of nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing
A methodology for the development and the support of massively parallel programs

Future Generation Computer Systems - Special triple issue: parallel and distributed workstation systems
Twisted data layout

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compilation and delayed evaluation in APL

POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Parallel Programming Using Skeleton Functions

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Efficient Distributed Memory Implementation of a Data Parallel Functional Language

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Data Distribution Algebras - A Formal Basis for Programming Using Skeletons

PROCOMET '94 Proceedings of the IFIP TC2/WG2.1/WG2.2/WG2.3 Working Conference on Programming Concepts, Methods and Calculi
An efficient algorithm for exploiting multiple arithmetic units

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a parallel implementation of a matrix/vector library for C++ for a large distributed-memory multicomputer. The library is "self-optimising" by exploiting lazy evaluation: execution of matrix operations is delayed as much as possible. This exposes the context in which each intermediate result is used. The run-time system extracts a functional representation of the values being computed and optimises data distribution, grain size and scheduling prior to execution. This exploits results in the theory of program transformation for optimising parallel functional programs, while presenting an entirely conventional interface to the programmer. We present details of some of the simple optimisations we have implemented so far and illustrate their effect using a small example. Conventionally, optimisation is confined to compile-time, and compilation is completed before run-time. Many exciting opportunities are lost by this convenient divide. This paper presents one example of such a possibility. We do optimisation at run-time for three important reasons: • We wish to deliver a library which uses parallelism to implement ADTs efficiently, callable from any client program (in any sensible language) without special parallel programming expertise. This means we cannot perform compile-time analysis of the caller's source code. • We wish to perform optimisations which take advantage of how the client program uses the intermediate values. This would be straightforward at compile-time, but not for a library called at run-time. • We wish to take advantage of information available only at run-time, such as the way operations are composed, and the size and characteristics of intermediary data structures. We aim to get much of the performance of compile-time optimisation, possibly more by using run-time information, while retaining the ease with which a library can be installed and used. There is some run-time overhead involved, which limits the scope of the approach.