Compilation for a high-performance systolic array
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
The warp computer: Architecture, implementation, and performance
IEEE Transactions on Computers
A Fortran compiler for the FPS-164 scientific computer
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A parallel language and its compilation to multiprocessor machines or VLSI
POPL '86 Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Automatic synthesis of systolic arrays from uniform recurrent equations
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Compiling programs for a linear systolic array
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Software pipelining: an effective scheduling technique for VLIW machines
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Task partitioning for multi-core network processors
CC'05 Proceedings of the 14th international conference on Compiler Construction
Hi-index | 0.00 |
A programmable systolic array of high-performance cells is an attractive computation engine if it attains the same utilization of dedicated arrays of simple cells. However, typical implementation techniques used in high-performance processors, such as pipelining and parallel functional units, further complicate the already difficult task of systolic algorithm design. This paper shows that high-performance systolic arrays can be used effectively by presenting the machine to the user as an array of conventional processors communicating asynchronously. This abstraction allows the user to focus on the higher level problem of partitioning a computation across cells in the array. Efficient fine-grain parallelism can be achieved by code motion of communication operations made possible by the asynchronous communication model. This asynchronous communication model is recommended even for programming algorithms on systolic arrays without dynamic flow control between cells.The ideas presented in the paper have been validated in the compiler for the Warp machine [4]. The compiler has been in use in various application areas including robot navigation, low-level vision, signal processing and scientific programming. Near-optimal code has been generated for many published systolic algorithms.