On Synthesizing Optimal Family of Linear Systolic Arrays for Matrix Multiplication
IEEE Transactions on Computers
Adaptive filter theory (2nd ed.)
Adaptive filter theory (2nd ed.)
Practical low power digital VLSI design
Practical low power digital VLSI design
Dynamic power consumption in Virtex™-II FPGA family
FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
High-Level Power Analysis and Optimization
High-Level Power Analysis and Optimization
Energy-Efficient Matrix Multiplication on FPGAs
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
High-Level Power Modeling of CPLDs and FPGAs
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Domain-Specific Modeling for Rapid Energy Estimation of Reconfigurable Architectures
The Journal of Supercomputing
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Energy- and time-efficient matrix multiplication on FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Algorithm design methodology for embedded architectures
ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
A Reconfigurable Parallel Hardware Implementation of the Self-Tuning Regulator
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.00 |
Recently, energy dissipation for computations on FPGAs has become an important performance metric. In this paper, we summarize our recent efforts in developing an algorithm-level design methodology for optimizing the energy performance of FPGA based implementations. For kernels, our design methodology consists of four steps: domain selection, domain-specific energy modeling, domain-space exploration and low-level simulation. To achieve system-level energy-efficiency, we outline a design methodology that integrates the kernel-level design methodology. Both the design methodologies can be used to achieve not only energy-efficiency but also latency, area, and power efficiency. We consider signal processing kernels as illustrative examples and demonstrate energy and time efficient algorithms and implementations for these on FPGAs. Example energy performance optimization through algorithmic optimizations include the 29--51% improvement in energy performance for a matrix multiplication kernel, 57--78% improvement for a FFT kernel and the 10--60% improvement for a floating-point LU decomposition kernel over state-of-the-art implementations. Similarly, an improvement of 41 to 46% in energy performance was achieved by the system-level design approach over a greedy approach for a MVDR adaptive beamforming application. Finally we briefly describe a high-level tool for obtaining parameterized and energy-efficient designs on FPGAs.