Energy- and time-efficient matrix multiplication on FPGAs

Authors:
Ju-Wook Jang;Seonil B. Choi;Viktor K. Prasanna
Affiliations:
Department of Electronic Engineering Sogang University, Seoul, Korea;Intel Corporation, Chandler, AZ;Department of Electrical Engineering--Systems, University of Southern California, Los Angeles, CA
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2005

Citing 11
Cited 6

On Synthesizing Optimal Family of Linear Systolic Arrays for Matrix Multiplication

IEEE Transactions on Computers
Regression-based RTL power modeling

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Evaluation of the streams-C C-to-FPGA compiler: an applications perspective

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Dynamic power consumption in Virtex™-II FPGA family

FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
Energy-efficient signal processing using FPGAs

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Energy-Efficient Matrix Multiplication on FPGAs

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Image Registration of Real-Time Broadcast Video Using the UltraSONIC Reconfigurable Computer

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Reconfigurable Computing in Remote and Harsh Environments

FPL '99 Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications
PAM-Blox: High Performance FPGA Design for Adaptive Computing

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
High-Level Language Abstraction for Reconfigurable Computing

Computer
Domain-Specific Modeling for Rapid Energy Estimation of Reconfigurable Architectures

The Journal of Supercomputing

Energy-Efficient Computations on FPGAs

The Journal of Supercomputing
Resource and delay efficient matrix multiplication using newer FPGA devices

GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Multivariate Gaussian Random Number Generation Targeting Reconfigurable Hardware

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
VLSI-efficient scheme and FPGA realization for robotic mapping in a dynamic environment

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploiting data-level parallelism for energy-efficient implementation of LDPC decoders and DCT on an FPGA

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop new algorithms and architectures for matrix multiplication on configurable devices. These have reduced energy dissipation and latency compared with the state-of-the-art field-programmable gate array (FPGA)-based designs. By profiling well-known designs, we identify "energy hot spots," which are responsible for most of the energy dissipation. Based on this, we develop algorithms and architectures that offer tradeoffs among the number of I/O ports, the number of registers, and the number of PEs. To avoid time-consuming, low-level simulations for energy profiling and performance prediction of many alternate designs, we derive functions to represent the impact of algorithm design choices on the system-wide energy dissipation, area, and latency. These functions are used to either optimize the energy performance or provide tradeoffs for a family of candidate algorithms and architectures. For selected designs, we perform extensive low-level simulations using state-of-the-art tools and target FPGA devices. We show a design space for matrix multiplication on FPGAs that results in tradeoffs among energy, area, and latency. For example, our designs improve the energy performance of state-of-the-art FPGA-based designs by 29%-51% without any increase in the area-latency product. The latency of our designs is reduced one-third to one-fifteenth while area is increased 1.9-9.4 times. In terms of comprehensive metrics such as Energy-Area-Time, our designs exhibit superior performance compared with the state-of-the-art by 50%-79%.