A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Architectural optimization of decomposition algorithms for wireless communication systems
WCNC'09 Proceedings of the 2009 IEEE conference on Wireless Communications & Networking Conference
GUSTO: An automatic generation and optimization tool for matrix inversion architectures
ACM Transactions on Embedded Computing Systems (TECS)
Iterative QR decomposition architecture using the modified gram-schmidt algorithm for MIMO systems
IEEE Transactions on Circuits and Systems Part I: Regular Papers - Special issue on ISCAS 2009
Energy efficient hardware architecture of LU triangularization for MIMO receiver
IEEE Transactions on Circuits and Systems II: Express Briefs
A radius adaptive K-Best decoder with early termination: algorithm and VLSI architecture
IEEE Transactions on Circuits and Systems Part I: Regular Papers - Special section on 2009 IEEE custom integrated circuits conference
A unified co-processor architecture for matrix decomposition
Journal of Computer Science and Technology
Scalable linear array architectures for matrix inversion using Bi-z CORDIC
Microelectronics Journal
Journal of Signal Processing Systems
Hi-index | 0.00 |
Matrix inversion and triangularization problems are common to a wide variety of communication systems, signal processing applications and solution of a set of linear equations. Matrix inversion is a computationally intensive process and its hardware implementation based on fixed-point (FP) arithmetic is a challenging problem. This paper proposes a fully parallel VLSI architecture under fixed-precision for the inverse computation of a real square matrix using QR decomposition with Modified Gram-Schmidt (MGS) orthogonalization. The MGS algorithm is stable and accurate to the integral multiples of machine precision under fixed-precision for a well-conditioned non-singular matrix. For typical matrices (4x4) found in MIMO communication systems, the proposed architecture was able to achieve a clock rate of 277 MHz with a latency of 18 time units and area of 72K gates using 0.18-um CMOS technology. For a generic square matrix of order n,the latency required is5n - 2 which is better than all previously known architectures. With the use of LUTs and log-domain computations, the total area has been reduced compared to architectures based on linear-domain computations.