Synchronizing large VLSI processor arrays
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Systolic Processing and an Implementation for Signal and Image Processing
IEEE Transactions on Computers
Computer
Wafer-scale integration of systolic arrays
SFCS '82 Proceedings of the 23rd Annual Symposium on Foundations of Computer Science
Synthesis of an Optimal Family of Matrix Multiplication Algorithms on Linear Arrays
IEEE Transactions on Computers
Optimal Graph Algorithms on a Fixed-Size Linear Array
IEEE Transactions on Computers
Mapping Homogeneous Graphs on Linear Arrays
IEEE Transactions on Computers
Hi-index | 14.99 |
A matrix multiplication algorithm on a linear array of processing elements is described. The local storage required by the processing elements and the I/O bandwidth required to drive the array are both constants that are independent of the sizes of the matrices being multiplied. The algorithm is therefore modular, that is, arbitrarily large matrices can be multiplied on a large array built by cascading smaller arrays. Each of the matrix elements is read only once from a fixed I/O port and the algorithm does not use global broadcasting. It is also shown that the proposed algorithm computes the n3 scalar products (where n is the size of the two matrices being multiplied) using an optimal number of processing elements.