Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores

Authors:
Ling Zhuo;Gerald R. Morris;Viktor K. Prasanna
Affiliations:
University of Southern California, Los Angeles;University of Southern California, Los Angeles;University of Southern California, Los Angeles
Venue:
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Year:
2005

Citing 4
Cited 9

Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques

IEEE Transactions on Computers
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Sparse Matrix-Vector multiplication on FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Design Tradeoffs for BLAS Operations on Reconfigurable Hardware

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing

Sparse Matrix-Vector multiplication on FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
High Performance Linear Algebra Operations on Reconfigurable Systems

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs

IEEE Transactions on Parallel and Distributed Systems
FPGA-based, floating-point reduction operations

MATH'06 Proceedings of the 10th WSEAS International Conference on APPLIED MATHEMATICS
A pipelined-loop-compatible architecture and algorithm to reduce variable-length sets of floating-point data on a reconfigurable computer

Journal of Parallel and Distributed Computing
An improved reduction algorithm with deeply pipelined operators

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Accelerating DTI tractography using FPGAs

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An efficient FPGA matrix multiplier for linear system simulation

Proceedings of the 2013 Grand Challenges on Modeling and Simulation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of pipelined floating-point arithmetic cores to create high-performance FPGA-based computational kernels has introduced a new class of problems that do not exist when using single-cycle arithmetic cores. In particular, the data hazards associated with pipelined floating-point reduction circuits can limit the scalability or severely reduce the performance of an otherwise high-performance computational kernel. The inability to efficiently execute the reduction in hardware coupled with memory bandwidth issues may even negate the performance gains derived from hardware acceleration of the kernel. In this paper we introduce a method for developing scalable floating-point reduction circuits that run in optimal time while requiring only 驴(lg (n)) space and a single pipelined floating-point unit. Using a Xilinx Virtex-II Pro as the target device, we implement reference instances of our reduction method and present the FPGA design statistics supporting our scalability claims.