Accelerating Matrix Operations with Improved Deeply Pipelined Vector Reduction

Authors:
Yi-Gang Tai;Chia-Tien Dan Lo;Kleanthis Psarris
Affiliations:
University of Texas at San Antonio, San Antonio;Southern Polytechnic State University, Marietta;University of Texas at San Antonio, San Antonio
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2012

Citing 0
Cited 4

Hardware description and synthesis of control-intensive reconfigurable dataflow architectures (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Monte Carlo simulation of the Ising model on FPGA

Journal of Computational Physics
Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Scalable matrix decompositions with multiple cores on FPGAs

Microprocessors & Microsystems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many scientific or engineering applications involve matrix operations, in which reduction of vectors is a common operation. If the core operator of the reduction is deeply pipelined, which is usually the case, dependencies between the input data elements cause data hazards. To tackle this problem, we propose a new reduction method with low latency and high pipeline utilization. The performance of the proposed design is evaluated for both single data set and multiple data set scenarios. Further, QR decomposition is used to demonstrate how the proposed method can accelerate its execution. We implement the design on an FPGA and compare its results to other methods.