Fast wavelet transform utilizing a multicore-aware framework

Authors:
Markus Stürmer;Harald Köstler;Ulrich Rüde
Affiliations:
System Simulation Group, University of Erlangen-Nuremberg, Germany;System Simulation Group, University of Erlangen-Nuremberg, Germany;System Simulation Group, University of Erlangen-Nuremberg, Germany
Venue:
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Year:
2010

Citing 9
Cited 0

Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting

IEEE Transactions on Parallel and Distributed Systems
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A Parallel Implementation of the 2D Wavelet Transform Using CUDA

PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Parallel data-locality aware stencil computations on modern micro-architectures

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization

COMPSAC '09 Proceedings of the 2009 33rd Annual IEEE International Computer Software and Applications Conference - Volume 01
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

SIAM Review
Minimizing communication in sparse matrix solvers

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
OMPCUDA: OpenMP execution framework for CUDA based on omni OpenMP compiler

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more

Quantified Score

Hi-index	0.00

Visualization

Abstract

The move to multicore processors creates new demands on software development in order to profit from the improved capabilities. Most important, algorithm and code must be parallelized wherever possible, but also the growing memory wall must be considered. Additionally, high computational performance can only be reached if architecture-specific features are made use of. To address this complexity, we developed a C++ framework that simplifies the development of performance-optimized, parallel, memory-efficient, stencil-based codes on standard multicore processors and the heterogeneous Cell processor developed jointly by Sony, Toshiba, and IBM. We illustrate the implementation and optimization of the Fast Wavelet Transform and its inverse for Haar wavelets within our hybrid framework, using OpenMP, and using the Open Compute Language, and analyze performance results for different platforms.