Efficient Partitioning of Algorithms for Long Convolutions and their Mapping onto Architectures

Authors:
Laurens Bierens;Ed Deprettere
Affiliations:
TNO Physics and Electronics Laboratory, P.O.Box 96864, 2509 JG The Hague, The Netherlands. E-mail: bierens@fel.tno.nl;Delft Univ. of Technology, Dept. of Electrical Engineering, P.O.Box 5031, 2600 GA Delft, The Netherlands. E-mail: ed@cas.et.tudelft.nl
Venue:
Journal of VLSI Signal Processing Systems - Special issue on systematic trade-off analysis in signal processing systems design
Year:
1998

Citing 6
Cited 1

VLSI array processors

VLSI array processors
Discrete-time signal processing

Discrete-time signal processing
Example of combined algorithm development and architecture design

Integration, the VLSI Journal
Engineering multirate convolutions for radar imaging

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 06
Fast algorithm for FIR filtering in the transform domain

IEEE Transactions on Signal Processing
A new method for efficient convolution in frequency domain bynonuniform partitioning for adaptive filtering

IEEE Transactions on Signal Processing

Dedicated Circuits for the Generation of Windows in Image Processing Architectures

Journal of VLSI Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an efficient approach for the partitioning of algorithmsimplementing long convolutions. The dependence graph (DG) of a convolution algorithm is locally sequential globally parallel (LSGP) partitioned intosmaller, less complex convolution algorithms. The LSGP partitioned DG ismapped onto a signal flow graph (SFG), in which each processor element(PE) performs a small convolution algorithm. The key is then to reduce thecomplexity of the SFG in two steps: 1. local reduction of complexity: the short Fast Fourier Transform (FFT) is used to perform the smallconvolution within the PE; and 2. global reduction of complexity: theshort FFTs within the PEs are relocated to the global level, whereredundant short FFT operations are eliminated. The remaining operationwithin the PEs is now a simple element-wise multiply-add. After a graphtransform, the structure of the SFG kernel is recognized as a set ofparallel small convolutions. If we use the short FFT to perform these shortconvolutions, we come to our final realization of the long convolutionalgorithm. The computational complexity of this realization is close to theoptimum for convolutions, that is, O(N log N). Our approach is thusachieving this N log N –low without having to implement large-sizeFFTs. We use, instead, small FFT blocks. The advantage is that small FFTtransforms are commercially available, and that they can even beimplemented in single-chip VLSI architectures. Our final SFG is threedimensional and can be mapped efficiently onto prototype architectures ordedicated VLSI processors. We demonstrate the procedure in the paper by adesign example: the implementation of a prototype convolution architecturethat we designed for a real-time radar imaging system.