Compiler optimizations for asynchronous systolic array programs

  • Authors:
  • M. Lam

  • Affiliations:
  • Department of Computer Science, Carnegie-Mellon University, Pittsburgh, Pennsylvania

  • Venue:
  • POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
  • Year:
  • 1988

Quantified Score

Hi-index 0.00

Visualization

Abstract

A programmable systolic array of high-performance cells is an attractive computation engine if it attains the same utilization of dedicated arrays of simple cells. However, typical implementation techniques used in high-performance processors, such as pipelining and parallel functional units, further complicate the already difficult task of systolic algorithm design. This paper shows that high-performance systolic arrays can be used effectively by presenting the machine to the user as an array of conventional processors communicating asynchronously. This abstraction allows the user to focus on the higher level problem of partitioning a computation across cells in the array. Efficient fine-grain parallelism can be achieved by code motion of communication operations made possible by the asynchronous communication model. This asynchronous communication model is recommended even for programming algorithms on systolic arrays without dynamic flow control between cells.The ideas presented in the paper have been validated in the compiler for the Warp machine [4]. The compiler has been in use in various application areas including robot navigation, low-level vision, signal processing and scientific programming. Near-optimal code has been generated for many published systolic algorithms.