Efficient Partitioning of Algorithms for Long Convolutions and their Mapping onto Architectures

  • Authors:
  • Laurens Bierens;Ed Deprettere

  • Affiliations:
  • TNO Physics and Electronics Laboratory, P.O.Box 96864, 2509 JG The Hague, The Netherlands. E-mail: bierens@fel.tno.nl;Delft Univ. of Technology, Dept. of Electrical Engineering, P.O.Box 5031, 2600 GA Delft, The Netherlands. E-mail: ed@cas.et.tudelft.nl

  • Venue:
  • Journal of VLSI Signal Processing Systems - Special issue on systematic trade-off analysis in signal processing systems design
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an efficient approach for the partitioning of algorithmsimplementing long convolutions. The dependence graph (DG) of a convolution algorithm is locally sequential globally parallel (LSGP) partitioned intosmaller, less complex convolution algorithms. The LSGP partitioned DG ismapped onto a signal flow graph (SFG), in which each processor element(PE) performs a small convolution algorithm. The key is then to reduce thecomplexity of the SFG in two steps: 1. local reduction of complexity: the short Fast Fourier Transform (FFT) is used to perform the smallconvolution within the PE; and 2. global reduction of complexity: theshort FFTs within the PEs are relocated to the global level, whereredundant short FFT operations are eliminated. The remaining operationwithin the PEs is now a simple element-wise multiply-add. After a graphtransform, the structure of the SFG kernel is recognized as a set ofparallel small convolutions. If we use the short FFT to perform these shortconvolutions, we come to our final realization of the long convolutionalgorithm. The computational complexity of this realization is close to theoptimum for convolutions, that is, O(N log N). Our approach is thusachieving this N log N –low without having to implement large-sizeFFTs. We use, instead, small FFT blocks. The advantage is that small FFTtransforms are commercially available, and that they can even beimplemented in single-chip VLSI architectures. Our final SFG is threedimensional and can be mapped efficiently onto prototype architectures ordedicated VLSI processors. We demonstrate the procedure in the paper by adesign example: the implementation of a prototype convolution architecturethat we designed for a real-time radar imaging system.