An Improved Vector-Reduction Method
IEEE Transactions on Computers
Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
High-Performance FPGA-Based General Reduction Methods
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
High-Performance and Area-Efficient Reduction Circuits on FPGAs
SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
Vector-Reduction Techniques for Arithmetic Pipelines
IEEE Transactions on Computers
High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
Many scientific applications involve reduction or accumulation operations on sequential data streams. Examples such as matrix-vector multiplication include multiple inner product operations on different data sets. If the core operator of the reduction is deeply pipelined, which is usually the case, dependencies between the input data cause data hazards in the pipeline and ask for a proper design. In this paper, we propose a modified design of the reduction operation based on Sips and Lin's method. We analyze the performance of the proposed design to prove the correctness of the timing and demonstrate its performance against previous methods.