Multiplier-less VLSI architecture for real-time computation of multi-dimensional convolution
Microprocessors & Microsystems
An efficient multiplier-less architecture for 2-D convolution with quadrant symmetric kernels
Integration, the VLSI Journal
Microprocessors & Microsystems
Design of an efficient flexible architecture for color image enhancement
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Design of an efficient multiplier-less architecture for multi-dimensional convolution
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.01 |
This thesis surveys algorithms for computing linear and cyclic convolution. Algorithms are presented in a uniform mathematical notation that allows automatic derivation, optimization, and implementation. Using the tensor product and Chinese Remainder Theorem (CRT), a space of algorithms is defined and the task of finding the best algorithm is turned into an optimization problem over this space of algorithms. This formulation led to the discovery of new algorithms with reduced operation count. Symbolic tools are presented for deriving and implementing algorithms, and performance analyses (using both operation count and run-time as metrics) are carried out. These analyses show the existence of a window where CRT-based algorithms outperform other methods of computing convolutions. Finally a new method that combines the Fast Fourier Transform with the CRT methods is derived. This latter method is shown to be faster for some very large size convolutions than either method used alone.