Design and implementation of a high-speed matrix multiplier based on word-width decomposition

Authors:
Sangjin Hong;Kyoung-Su Park;Jun-Hee Mun
Affiliations:
Department of Electrical and Computer Engineering, State University of New York (SUNY) at Stony Brook, Stony Brook, NY;Department of Electrical and Computer Engineering, State University of New York (SUNY) at Stony Brook, Stony Brook, NY;Department of Electrical and Computer Engineering, State University of New York (SUNY) at Stony Brook, Stony Brook, NY
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2006

Citing 8
Cited 2

On Synthesizing Optimal Family of Linear Systolic Arrays for Matrix Multiplication

IEEE Transactions on Computers
Digital integrated circuits: a design perspective

Digital integrated circuits: a design perspective
Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System

IEEE Transactions on Parallel and Distributed Systems
Reconfigurable parallel inner product processor architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy-efficient signal processing using FPGAs

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Bit-Matrix Decomposition and Dynamic Reconfiguration: A Unified Arithmetic Processor Architecture, Design and Test

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Energy-Efficient Matrix Multiplication on FPGAs

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
PAM-Blox: High Performance FPGA Design for Adaptive Computing

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines

Design of a low-power, high performance, 8×8bit multiplier using a Shannon-based adder cell

Microelectronics Journal
FPGA realization of high performance large size computational functions: multipliers and applications

Analog Integrated Circuits and Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a flexible 2 × 2 matrix multiplier architecture. The architecture is based on word-width decomposition for flexible but high-speed operation. The elements in the matrices are successively decomposed so that a set of small multipliers and simple adders are used to generate partial results, which are combined to generate the final results. An energy reduction mechanism is incorporated in the architecture to minimize the power dissipation due to unnecessary switching of logic. Two types of decomposition schemes are discussed, which support 2's complement inputs, and its overall functionality is verified and designed with a field-programmable gate array (FPGA). The architecture can be easily extended to a reconfigurable matrix multiplier. We provide results on performance of the proposed architecture from FPGA post-synthesis results. We summarize design factors influencing the overall execution speed and complexity.