VLSI array processors
Discrete-time signal processing
Discrete-time signal processing
Example of combined algorithm development and architecture design
Integration, the VLSI Journal
Engineering multirate convolutions for radar imaging
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 06
Fast algorithm for FIR filtering in the transform domain
IEEE Transactions on Signal Processing
IEEE Transactions on Signal Processing
Dedicated Circuits for the Generation of Windows in Image Processing Architectures
Journal of VLSI Signal Processing Systems
Hi-index | 0.00 |
We present an efficient approach for the partitioning of algorithmsimplementing long convolutions. The dependence graph (DG) of a convolution algorithm is locally sequential globally parallel (LSGP) partitioned intosmaller, less complex convolution algorithms. The LSGP partitioned DG ismapped onto a signal flow graph (SFG), in which each processor element(PE) performs a small convolution algorithm. The key is then to reduce thecomplexity of the SFG in two steps: 1. local reduction of complexity: the short Fast Fourier Transform (FFT) is used to perform the smallconvolution within the PE; and 2. global reduction of complexity: theshort FFTs within the PEs are relocated to the global level, whereredundant short FFT operations are eliminated. The remaining operationwithin the PEs is now a simple element-wise multiply-add. After a graphtransform, the structure of the SFG kernel is recognized as a set ofparallel small convolutions. If we use the short FFT to perform these shortconvolutions, we come to our final realization of the long convolutionalgorithm. The computational complexity of this realization is close to theoptimum for convolutions, that is, O(N log N). Our approach is thusachieving this N log N –low without having to implement large-sizeFFTs. We use, instead, small FFT blocks. The advantage is that small FFTtransforms are commercially available, and that they can even beimplemented in single-chip VLSI architectures. Our final SFG is threedimensional and can be mapped efficiently onto prototype architectures ordedicated VLSI processors. We demonstrate the procedure in the paper by adesign example: the implementation of a prototype convolution architecturethat we designed for a real-time radar imaging system.