Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Monte Carlo simulation of the Ising model on FPGA
Journal of Computational Physics
Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Scalable matrix decompositions with multiple cores on FPGAs
Microprocessors & Microsystems
Hi-index | 0.01 |
Many scientific or engineering applications involve matrix operations, in which reduction of vectors is a common operation. If the core operator of the reduction is deeply pipelined, which is usually the case, dependencies between the input data elements cause data hazards. To tackle this problem, we propose a new reduction method with low latency and high pipeline utilization. The performance of the proposed design is evaluated for both single data set and multiple data set scenarios. Further, QR decomposition is used to demonstrate how the proposed method can accelerate its execution. We implement the design on an FPGA and compare its results to other methods.