A truly two-dimensional systolic array FPGA implementation of QR decomposition

Authors:
Xiaojun Wang;Miriam Leeser
Affiliations:
Airvana;Northeastern University, Boston, MA
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2009

Citing 9
Cited 2

Givens elimination on systolic arrays

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Unitary Triangularization of a Nonsymmetric Matrix

Journal of the ACM (JACM)
Reciprocation, Square Root, Inverse Square Root, and Some Elementary Functions Using Small Multipliers

IEEE Transactions on Computers - Special issue on computer arithmetic
Implementation of Givens QR-Decomposition in FPGA

PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Logarithmic Number System and Floating-Point Arithmetics on FPGA

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
FPGA based Embedded Processing Architecture for the QRD-RLS Algorithm

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Advanced Components in the Variable Precision Floating-Point Library

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Scheduling of Iterative Algorithms with Matrix Operations for Efficient FPGA Design--Implementation of Finite Interval Constant Modulus Algorithm

Journal of VLSI Signal Processing Systems
Effective anonymization of query logs

Proceedings of the 18th ACM conference on Information and knowledge management

FPGA implementation of Kalman filter for neural ensemble decoding of rat's motor cortex

Neurocomputing
Scalable matrix decompositions with multiple cores on FPGAs

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have implemented a two-dimensional systolic array QR decomposition on a Xilinx Virtex5 FPGA using the Givens rotation algorithm. QR decomposition is a key step in many DSP applications including sonar beamforming, channel equalization, and 3G wireless communication. Compared to previous work that implements Givens rotations using a one-dimensional systolic array, our implementation uses a truly two-dimensional systolic array architecture. As a result, latency scales well for larger matrices. In addition, prior work avoids divide and square root operations in the Givens rotation algorithm by using special operations such as CORDIC or special number systems such as the logarithmic number system (LNS). In contrast, our design uses straightforward floating-point divide and square root implementations, which makes it easier to be used within a larger system. In our design, the input matrix size can be configured at compile time to many different sizes, making it easily scalable to future large FPGAs or over multiple FPGAs. The QR module is fully pipelined with a throughput of over 130MHz for the IEEE single-precision floating-point format. The peak performance for a 12 × 12 input matrix is approximately 35 GFLOPs.