Compiler optimizations for asynchronous systolic array programs

Authors:
M. Lam
Affiliations:
Department of Computer Science, Carnegie-Mellon University, Pittsburgh, Pennsylvania
Venue:
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Year:
1988

Citing 6
Cited 4

Compilation for a high-performance systolic array

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
The warp computer: Architecture, implementation, and performance

IEEE Transactions on Computers
A Fortran compiler for the FPS-164 scientific computer

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A parallel language and its compilation to multiprocessor machines or VLSI

POPL '86 Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Automatic synthesis of systolic arrays from uniform recurrent equations

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Compiling programs for a linear systolic array

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Software pipelining: an effective scheduling technique for VLIW machines

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Task partitioning for multi-core network processors

CC'05 Proceedings of the 14th international conference on Compiler Construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

A programmable systolic array of high-performance cells is an attractive computation engine if it attains the same utilization of dedicated arrays of simple cells. However, typical implementation techniques used in high-performance processors, such as pipelining and parallel functional units, further complicate the already difficult task of systolic algorithm design. This paper shows that high-performance systolic arrays can be used effectively by presenting the machine to the user as an array of conventional processors communicating asynchronously. This abstraction allows the user to focus on the higher level problem of partitioning a computation across cells in the array. Efficient fine-grain parallelism can be achieved by code motion of communication operations made possible by the asynchronous communication model. This asynchronous communication model is recommended even for programming algorithms on systolic arrays without dynamic flow control between cells.The ideas presented in the paper have been validated in the compiler for the Warp machine [4]. The compiler has been in use in various application areas including robot navigation, low-level vision, signal processing and scientific programming. Near-optimal code has been generated for many published systolic algorithms.