An improved reduction algorithm with deeply pipelined operators

Authors:
Yi-Gang Tai;Chia-Tien Dan Lo;Kleanthis Psarris
Affiliations:
Department of Computer Science, University of Texas at San Antonio, San Antonio, TX;Department of Computer Science, University of Texas at San Antonio, San Antonio, TX;Department of Computer Science, University of Texas at San Antonio, San Antonio, TX
Venue:
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Year:
2009

Citing 7
Cited 0

An Improved Vector-Reduction Method

IEEE Transactions on Computers
Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
High-Performance FPGA-Based General Reduction Methods

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
High-Performance and Area-Efficient Reduction Circuits on FPGAs

SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
An FPGA-Based Application-Specific Processor for Efficient Reduction of Multiple Variable-Length Floating-Point Data Sets

ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
Vector-Reduction Techniques for Arithmetic Pipelines

IEEE Transactions on Computers
High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many scientific applications involve reduction or accumulation operations on sequential data streams. Examples such as matrix-vector multiplication include multiple inner product operations on different data sets. If the core operator of the reduction is deeply pipelined, which is usually the case, dependencies between the input data cause data hazards in the pipeline and ask for a proper design. In this paper, we propose a modified design of the reduction operation based on Sips and Lin's method. We analyze the performance of the proposed design to prove the correctness of the timing and demonstrate its performance against previous methods.