Parallelizing the fast wavelet transform
Parallel Computing
Coarse-Grained Parallel Algorithms for Multi-DimensionalWavelet Transforms
The Journal of Supercomputing
Wavelets for computer graphics: theory and applications
Wavelets for computer graphics: theory and applications
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Efficient realizations of encoders and decoders based on the 2-D discrete wavelet transform
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Parallel Wavelet Transform for Large Scale Image Processing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Multicomputer Algorithms for Wavelet Packet Image Decomposition
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Line-based, reduced memory, wavelet image compression
IEEE Transactions on Image Processing
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Hi-index | 0.00 |
In this paper we discuss several issues relevant to the vectorization of a 2-D Discrete Wavelet Transform on current microprocessors. Our research is based on previous studies about the efficient exploitation of the memory hierarchy, due to its tremendous impact on performance. We have extended this work with a more detailed analysis based on hardware performance counters and a study of vectorization, in particular, we have used the Intel Pentium SSE instruction set. Most of our optimizations are performed at source code level to allow automatic vectorization, though some compiler intrinsic functions have been introduced to enhance performance. Taking into account the abstraction at which the optimizations are performed, the results obtained on an Intel Pentium III microprocessor are quite satisfactory, even though further improvement can be obtained by a more extensive use of compiler intrinsics.