A lazy, self-optimising parallel matrix library

  • Authors:
  • Simon Govier;Paul H J Kelly

  • Affiliations:
  • Department of Computing, Imperial College, London, UK;Department of Computing, Imperial College, London, UK

  • Venue:
  • FP'95 Proceedings of the 1995 international conference on Functional Programming
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a parallel implementation of a matrix/vector library for C++ for a large distributed-memory multicomputer. The library is "self-optimising" by exploiting lazy evaluation: execution of matrix operations is delayed as much as possible. This exposes the context in which each intermediate result is used. The run-time system extracts a functional representation of the values being computed and optimises data distribution, grain size and scheduling prior to execution. This exploits results in the theory of program transformation for optimising parallel functional programs, while presenting an entirely conventional interface to the programmer. We present details of some of the simple optimisations we have implemented so far and illustrate their effect using a small example. Conventionally, optimisation is confined to compile-time, and compilation is completed before run-time. Many exciting opportunities are lost by this convenient divide. This paper presents one example of such a possibility. We do optimisation at run-time for three important reasons: • We wish to deliver a library which uses parallelism to implement ADTs efficiently, callable from any client program (in any sensible language) without special parallel programming expertise. This means we cannot perform compile-time analysis of the caller's source code. • We wish to perform optimisations which take advantage of how the client program uses the intermediate values. This would be straightforward at compile-time, but not for a library called at run-time. • We wish to take advantage of information available only at run-time, such as the way operations are composed, and the size and characteristics of intermediary data structures. We aim to get much of the performance of compile-time optimisation, possibly more by using run-time information, while retaining the ease with which a library can be installed and used. There is some run-time overhead involved, which limits the scope of the approach.