Architecture and FPGA design of dichotomous coordinate descent algorithms

  • Authors:
  • Jie Liu;Yuriy V. Zakharov;Ben Weaver

  • Affiliations:
  • Department of Electronics, The University of York, York, UK;Department of Electronics, The University of York, York, UK;Department of Electronics, The University of York, York, UK

  • Venue:
  • IEEE Transactions on Circuits and Systems Part I: Regular Papers
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the areas of signal processing and communications, such as antenna-array beamforming, adaptive filtering, multiuser and multiple-input-multiple-output (MIMO) detection, channel estimation and equalization, echo and interference cancellation, and others, solving linear systems of equations often provides an optimal performance. However, this is also a very complicated operation that designers try to avoid by proposing different suboptimal techniques. The dichotomous coordinate descent (DCD) algorithm allows linear systems of equations to be solved with high computational efficiency. In this paper, we present architectures and field-programmable gate-array (FPGA) designs of two variants of the DCD algorithm, which are known as cyclic and leading DCD algorithms. For each of these techniques, we present serial designs, group-2 and group-4 designs, as well as a design with parallel update of the residual vector for the cyclic DCD algorithm. These designs have different degrees of parallelism, thus enabling a tradeoff between FPGA resources and computation time. The serial designs require the smallest FPGA resources; they are well suited for applications where many parallel solvers are required, e.g., for detection in MIMO-orthogonal-frequency-division-multiplexing communication systems. The parallelism introduced in the proposed group-2 and group-4 designs allows faster convergence to the true solution at the expense of an increase in FPGA resources. The design with parallel update of the residual vector provides the fastest convergence speed; however, if the system size is high, it may result in a significant increase in FPGA resources. The proposed fixed-point designs provide an accuracy performance that is very close to the performance of floating-point counterparts and require significantly lower FPGA resources than techniques based on QR decomposition.