Discrete wavelet transform: data dependence analysis and synthesisof distributed memory and control array architectures

  • Authors:
  • J. Fridman;E.S. Manolakos

  • Affiliations:
  • Analog Devices Inc., Norwood, MA;-

  • Venue:
  • IEEE Transactions on Signal Processing
  • Year:
  • 1997

Quantified Score

Hi-index 35.68

Visualization

Abstract

We perform a thorough data dependence and localization analysis for the discrete wavelet transform algorithm and then use it to synthesize distributed memory and control architectures for its parallel computation. The discrete wavelet transform (DWT) is characterized by a nonuniform data dependence structure owing to the decimation operation it is neither a uniform recurrence equation (URE) nor an affine recurrence equation (ARE) and consequently cannot be transformed directly using linear space-time mapping methods into efficient array architectures. Our approach is to apply first appropriate nonlinear transformations operating on the algorithm's index space, leading to a new DWT formulation on which application of linear space-time mapping can become effective. The first transformation of the algorithm achieves regularization of interoctave dependencies but alone does not lead to efficient array solutions after the mapping due to limitations associated with transforming the three-dimensional (3-D) algorithm onto one-dimensional (1-D) arrays, which is also known as multiprojection. The second transformation is introduced to remove the need for multiprojection by formulating the regularized DWT algorithm in a two-dimensional (2-D) index space. Using this DWT formulation, we have synthesized two VLSI-amenable linear arrays of LPEs computing a 6-octave DWT decomposition with latencies of M and 2M-1, respectively, where L is the wavelet filter length, and M is the number of samples in the data sequence. The arrays are modular, regular, use simple control, and can be easily extended to larger L and J. The latency of both arrays is independent of the highest octave J, and the efficiency is nearly 100% for any M with one design achieving the lowest possible latency of M