Improving the memory behavior of vertical filtering in the discrete wavelet transform

  • Authors:
  • Asadollah Shahbahrami;Ben Juurlink;Stamatis Vassiliadis

  • Affiliations:
  • Delft University of Technology, The Netherlands;Delft University of Technology, The Netherlands;Delft University of Technology, The Netherlands

  • Venue:
  • Proceedings of the 3rd conference on Computing frontiers
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The discrete wavelet transform (DWT) is used in several image and video compression standards, in particular JPEG2000. A 2D DWT consists of horizontal filtering along the rows followed by vertical filtering along the columns. It is well-known that a straightforward implementation of vertical filtering (assuming a row-major layout) induces many cache misses, due to lack of spatial locality. This can be avoided by interchanging the loops. This paper shows, however, that the resulting implementation suffers significantly from 64K aliasing, which occurs in the Pentium 4 when two data blocks are accessed that are a multiple of 64K apart, and we propose two techniques to avoid it. In addition, if the filter length is longer than four, the number of ways of the L1 data cache of the Pentium 4 is insufficient to avoid cache conflict misses. Consequently, we propose two methods for reducing conflict misses. Although experimental results have been collected on the Pentium 4, the techniques are general and can be applied to other processors with different cache organizations as well. The proposed techniques improve the performance of vertical filtering compared to already optimized baseline implementations by a factor of 3.11 for the (5,3) lifting scheme, 3.11 for Daubechies' transform of four coefficients, and by a factor of 1.99 for the Cohen, Daubechies, and Feauveau 9/7 transform.