Data Flow Representation of Iterative Algorithms for Systolic Arrays
IEEE Transactions on Computers
Some Combinatorial Aspects of Parallel Algorithm Design for Matrix Multiplication
IEEE Transactions on Computers
Design of Efficient Regular Arrays for Matrix Multiplication by Two-Step Regularization
IEEE Transactions on Parallel and Distributed Systems
Some New Designs of 2-D Array for Matrix Multiplication and Transitive Closure
IEEE Transactions on Parallel and Distributed Systems
Buffer insertion for clock delay and skew minimization
ISPD '99 Proceedings of the 1999 international symposium on Physical design
Fundamentals of Parallel Processing
Fundamentals of Parallel Processing
Concurrent Error Detection in Montgomery Multiplication over GF(2m)
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
The Design of Optimal Systolic Arrays
IEEE Transactions on Computers
Algorithm-Based Fault Tolerance for Matrix Operations
IEEE Transactions on Computers
Designing of processor-time optimal systolic arrays for band matrix-vector multiplication
Computers & Mathematics with Applications
Multidimensional systolic arrays for the implementation of discreteFourier transforms
IEEE Transactions on Signal Processing
EWA: efficient wiring-sizing algorithm for signal nets and clock nets
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
This paper proposes an efficient systolic array construction method for optimal planar systolic design of the matrix multiplication. By connection network adjustment among systolic array processing element (PE), the input/output data are jumping in the systolic array for multiplication operation requirements. Various 2-D systolic array topologies, such as square topology and hexagonal topology, have been studied to construct appropriate systolic array configuration and realize high performance matrix multiplication. Based on traditional Kung-Leiserson systolic architecture, the proposed “Jumping Systolic Array (JSA)” algorithm can increase the matrix multiplication speed with less processing elements and few data registers attachment. New systolic arrays, such as square jumping array, redundant dummy latency jumping hexagonal array, and compact parallel flow jumping hexagonal array, are also proposed to improve the concurrent system operation efficiency. Experimental results prove that the JSA algorithm can realize fully concurrent operation and dominate other systolic architectures in the specific systolic array system characteristics, such as band width, matrix complexity, or expansion capability.