Wavelets for computer graphics: theory and applications
Wavelets for computer graphics: theory and applications
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
DCC '00 Proceedings of the Conference on Data Compression
Vectorization of the 2D Wavelet Lifting Transform Using SIMD Extensions
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Performance Comparison of SIMD Implementations of the Discrete Wavelet Transform
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Line-based, reduced memory, wavelet image compression
IEEE Transactions on Image Processing
Constructing application-specific memory hierarchies on FPGAs
Transactions on high-performance embedded architectures and compilers III
Algorithms and architectures for 2D discrete wavelet transform
The Journal of Supercomputing
Hi-index | 0.00 |
The discrete wavelet transform (DWT) is used in several image and video compression standards, in particular JPEG2000. A 2D DWT consists of horizontal filtering along the rows followed by vertical filtering along the columns. It is well-known that a straightforward implementation of vertical filtering (assuming a row-major layout) induces many cache misses, due to lack of spatial locality. This can be avoided by interchanging the loops. This paper shows, however, that the resulting implementation suffers significantly from 64K aliasing, which occurs in the Pentium 4 when two data blocks are accessed that are a multiple of 64K apart, and we propose two techniques to avoid it. In addition, if the filter length is longer than four, the number of ways of the L1 data cache of the Pentium 4 is insufficient to avoid cache conflict misses. Consequently, we propose two methods for reducing conflict misses. Although experimental results have been collected on the Pentium 4, the techniques are general and can be applied to other processors with different cache organizations as well. The proposed techniques improve the performance of vertical filtering compared to already optimized baseline implementations by a factor of 3.11 for the (5,3) lifting scheme, 3.11 for Daubechies' transform of four coefficients, and by a factor of 1.99 for the Cohen, Daubechies, and Feauveau 9/7 transform.