Energy-Efficient Matrix Multiplication on FPGAs

Authors:
Ju-wook Jang;Seonil Choi;Viktor K. Prasanna
Affiliations:
-;-;-
Venue:
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Year:
2002

Citing 4
Cited 8

On Synthesizing Optimal Family of Linear Systolic Arrays for Matrix Multiplication

IEEE Transactions on Computers
A Reconfigurable Engine for Real-Time Video Processing

FPL '98 Proceedings of the 8th International Workshop on Field-Programmable Logic and Applications, From FPGAs to Computing Paradigm
Accelerating Matrix Product on Reconfigurable Hardware for Signal Processing

FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
Reconfigurable Computing in Remote and Harsh Environments

FPL '99 Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications

Energy-efficient signal processing using FPGAs

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Domain-Specific Modeling for Rapid Energy Estimation of Reconfigurable Architectures

The Journal of Supercomputing
64-bit floating-point FPGA matrix multiplication

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Energy-Efficient Computations on FPGAs

The Journal of Supercomputing
Design and implementation of a high-speed matrix multiplier based on word-width decomposition

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy- and time-efficient matrix multiplication on FPGAs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A model-based extensible framework for efficient application design using FPGA

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop new algorithms and architectures for matrix multiplication on configurable devices. These designs significantly reduce the energy dissipation and latency compared with the state-of-the-art FPGA-based designs. We derive functions to represent the impact of algorithmic level design choices on the system-wide energy dissipation, latency, and area by capturing algorithm and architecture details including features of the target FPGA. The functions are used to optimize energy performance under latency and area constraints for a family of candidate algorithms and architectures. As a result, our designs improve the energy performance of the optimized design from the recent Xilinx library by 32% to 88% without any increase in area-latency product. In terms of comprehensive metrics such as EAT (Energy-Area-Time) and E/AT (Energy/Area-Time), our designs offer superior performance compared with the Xilinx design by 50%-79% and 13%-44%, respectively. We also address how to exploit further increases in density of future FPGA devices for asymptotic improvement in latency and energy dissipation for multiplication of larger size matrices.