Discrete-time signal processing
Discrete-time signal processing
On Stable Parallel Linear System Solvers
Journal of the ACM (JACM)
A Systolic Architecture for the Singular Value Decomposition
A Systolic Architecture for the Singular Value Decomposition
A Systolic Architecture for Almost Linear-Time Solution of the Symmetric Eigenvalue Problem
A Systolic Architecture for Almost Linear-Time Solution of the Symmetric Eigenvalue Problem
A Cordic Arithmetic Processor Chip
IEEE Transactions on Computers
A 40-ns 17-Bit by 17-Bit Array Multiplier
IEEE Transactions on Computers
A Canonical Bit-Sequential Multiplier
IEEE Transactions on Computers - Lecture notes in computer science Vol. 174
An 0(n) Parallel Multiplier with Bit-Sequential Input and Output
IEEE Transactions on Computers
Wavefront Array Processor: Language, Architecture, and Applications
IEEE Transactions on Computers
A Two's Complement Parallel Array Multiplication Algorithm
IEEE Transactions on Computers
Computer
A unified algorithm for elementary functions
AFIPS '71 (Spring) Proceedings of the May 18-20, 1971, spring joint computer conference
Hi-index | 0.00 |
Modern microelectronics technology holds the promise of high level, computer-aided design of very complex systems on a single silicon chip. The freedom to create non-standard architectures within this context has stimulated widespread interest in the development of computing structures that offer increased processing speed relative to the von Neumann architecture. This paper describes two architectures that are especially well suited for large scale integration because of their concurrent structure and their use of primarily local data flows. The first architecture is designed to implement the QR matrix decomposition and it can be used to reliably solve the least squares and eigenvalue problems of linear algebra. The second architecture is based on an elementary building block approach to the realization of FIR and IIR lattice digital filters. An especially important issue that is sometimes overlooked in the system level design of an architecture is the impact of the particular scheme for implementing the fundamental operations such as multiplication, rotation, etc., on the performance of the highly parallel computing structure. It is argued that if one can imbed pipelined operations within the concurrent computing structure, then often the resulting system will not only provide substantial processing gain but it can be implemented in such a way that efficient use of chip real estate is achieved.