A mathematical model for the verification of systolic networks
SIAM Journal on Computing
A design methodology for synthesizing parallel algorithms and architectures
Journal of Parallel and Distributed Computing
Information Transfer in Distributed Computing with Applications to VLSI
Journal of the ACM (JACM)
Introduction to VLSI Systems
Automatic synthesis of systolic arrays from uniform recurrent equations
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A Modular Systolic Linearization of the Warshall-Floyd Algorithm
IEEE Transactions on Parallel and Distributed Systems
Complexity of matrix product on modular linear systolic arrays for algorithms with affine schedules
Journal of Parallel and Distributed Computing
Hi-index | 14.98 |
A simple mapping technique is developed to design systolic arrays with limited I/O capability. The technique is used to improve systolic algorithms for some matrix computations on linearly connected arrays of PEs (processor elements) with constant I/O bandwidth. The important features of these designs are modularity with constant hardware in each PE, few control lines, simple data-input/output format, and improved delay time. This technique is extended to design an optimal n square root n-time systolic algorithm for n*n matrix multiplication with O( square root n) I/O bandwidth requirement on a fault-tolerant VLSI model. In this model, the propagation delay is assumed to be proportional to wire length. Fault reconfiguration is achieved by using buffers to bypass faulty PEs, which does not affect the clock rate of the system. The unidirectional flow of control and data assures correctness of the algorithm in the presence of faulty PEs. The design can be implemented on reconfigurable fault-tolerant VLSI arrays using the Diogenes methodology. The present designs are compared to those in the literature and are shown to be superior with respect to I/O format, control, and delay from input to output.