Discrete wavelet transform: data dependence analysis and synthesisof distributed memory and control array architectures

Authors:
J. Fridman;E.S. Manolakos
Affiliations:
Analog Devices Inc., Norwood, MA;-
Venue:
IEEE Transactions on Signal Processing
Year:
1997

Citing 0
Cited 12

On the Scalability of 2-D Discrete Wavelet Transform Algorithms

Multidimensional Systems and Signal Processing
DG2VHDL: A Tool to Facilitate the High Level Synthesisof Parallel Processing Array Architectures

Journal of VLSI Signal Processing Systems - Special issue on recent advances in the design and implementation of signal processing systems
A Programmable Parallel VLSI Architecture for 2-D Discrete Wavelet Transform

Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
A Parallel Architecture for the 2-D Discrete Wavelet Transform with Integer Lifting Scheme

Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
Distributed Memory Parallel Architecture Based on Modular Linear Arrays for 2-D Separable Transforms Computation

Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
VLSI architectures of the 1-D and 2-D discrete wavelet transforms for JPEG 2000

Signal Processing
Generalized High-Level Synthesis of Wavelet-Based Digital Systems via Nonlinear I/O Data Space Transformations

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
VLSI Implementation of 2-D DWT/IDWT Cores using 9/7-tap filter banks based on the Non-expansive Symmetric Extension Scheme

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
A Novel Architecture for Lifting-Based Discrete Wavelet Transform for JPEG2000 Standard suitable for VLSI implementation

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Non-RAM-based architectural designs of wavelet-based digital systems based on novel nonlinear I/O data space transformations

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A novel VLSI architecture for real-time line-based wavelet transform using lifting scheme

Journal of Computer Science and Technology
Handling large-size discrete wavelet transform on network-based computing systems - parallelization via divisible load paradigm

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	35.68

Visualization

Abstract

We perform a thorough data dependence and localization analysis for the discrete wavelet transform algorithm and then use it to synthesize distributed memory and control architectures for its parallel computation. The discrete wavelet transform (DWT) is characterized by a nonuniform data dependence structure owing to the decimation operation it is neither a uniform recurrence equation (URE) nor an affine recurrence equation (ARE) and consequently cannot be transformed directly using linear space-time mapping methods into efficient array architectures. Our approach is to apply first appropriate nonlinear transformations operating on the algorithm's index space, leading to a new DWT formulation on which application of linear space-time mapping can become effective. The first transformation of the algorithm achieves regularization of interoctave dependencies but alone does not lead to efficient array solutions after the mapping due to limitations associated with transforming the three-dimensional (3-D) algorithm onto one-dimensional (1-D) arrays, which is also known as multiprojection. The second transformation is introduced to remove the need for multiprojection by formulating the regularized DWT algorithm in a two-dimensional (2-D) index space. Using this DWT formulation, we have synthesized two VLSI-amenable linear arrays of LPEs computing a 6-octave DWT decomposition with latencies of M and 2M-1, respectively, where L is the wavelet filter length, and M is the number of samples in the data sequence. The arrays are modular, regular, use simple control, and can be easily extended to larger L and J. The latency of both arrays is independent of the highest octave J, and the efficiency is nearly 100% for any M with one design achieving the lowest possible latency of M