Introduction to VLSI Systems
Design of special-purpose VLSI chips: Example and opinions
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Use of VLSI in algebraic computation: Some suggestions
SYMSAC '81 Proceedings of the fourth ACM symposium on Symbolic and algebraic computation
AT2 = O(N log4 N), T = O(log N) fast Fourier transform in a light connected 3-dimensional VLSI
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
IEEE Transactions on Computers
A Design of Reed-Solomon Decoder with Systolic-Array Structure
IEEE Transactions on Computers
Improved sorting algorithms for parallel computers
CSC '87 Proceedings of the 15th annual conference on Computer Science
A multiprogramming stand alone systolic data flow machine (abstract only)
CSC '87 Proceedings of the 15th annual conference on Computer Science
HARP: An Open Architecture for Parallel Matrix and Signal Processing
IEEE Transactions on Parallel and Distributed Systems
A model of clocked micro-architectures for firmware engineering and design automation applications
MICRO 17 Proceedings of the 17th annual workshop on Microprogramming
Reduced-instruction set multi-microcomputer system
AFIPS '84 Proceedings of the July 9-12, 1984, national computer conference and exposition
Systolic VLSI Arrays for Polynomial GCD Computation
IEEE Transactions on Computers
Hi-index | 0.01 |
In recent years, many systolic algorithms have been proposed as solutions to computationally demanding problems in signal and image processing and other areas. Such algorithms exploit the regularity and parallelism of problems to achieve high performance and low I/O requirements. Since systolic algorithms generally consist of a few types of simple processors, or systolic cells, connected in a regular pattern, they are less expensive to design and implement than more general machines. This advantage is offset by the fact that a particular systolic system can generally be used only on a narrow set of problems, and thus design cost cannot be amortized over a large number of units. One way to approach this problem is to provide a programmable systolic chip (PSC), many copies of which can be connected and programmed to implement many systolic algorithms. The systolic environment, by virtue of its emphasis on continuous, regular flow of data and fairly simple per-cell processing, imposes new design requirements for programmable processors which are quite different from those found in a general-purpose system. This paper describes the CMU PSC, a single-chip microprocessor suitable for use in groups of tens or hundreds for the efficient implementation of a broad variety of systolic arrays. The processor has been fabricated in nMOS, and is undergoing testing.