FPGA-based, floating-point reduction operations

Authors:
Michael R. Bodnar;James P. Durbano;John R. Humphrey;Petersen F. Curt;Dennis W. Prather
Affiliations:
Electrical and Computer Engineering Department, University of Delaware, Newark, DE;Accelerated Computing Division, EM Photonics, Inc., Newark, DE;Accelerated Computing Division, EM Photonics, Inc., Newark, DE;Accelerated Computing Division, EM Photonics, Inc., Newark, DE;Electrical and Computer Engineering Department, University of Delaware, Newark, DE
Venue:
MATH'06 Proceedings of the 10th WSEAS International Conference on APPLIED MATHEMATICS
Year:
2006

Citing 4
Cited 1

FPGA-Based Acceleration of the 3D Finite-Difference Time-Domain Method

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
High-Performance and Area-Efficient Reduction Circuits on FPGAs

SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
Advanced Components in the Variable Precision Floating-Point Library

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

A novel linear array for discrete cosine transform

WSEAS Transactions on Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Floating-point reduction operations are a vital part of scientific computational kernels, such as vector dot-products, discrete cosine transforms (DCT), and matrix-matrix multiplications. As FPGAs continue to gain popularity in custom and embedded computing platforms, implementations of these applications in such platforms are desirable. Due to the inherently deep pipelines of high-performance floating-point cores in FPGAs, reduction circuits require special feedback and buffering schemes in order to realize full throughput. In this paper, we present our floating-point reduction architecture, clocked at more than 150 MHz targeting a Xilinx Virtex2 8000-4 FPGA.