Autotuning of adaptive mesh refinement PDE solvers on shared memory architectures

Authors:
Svetlana Nogina;Kristof Unterweger;Tobias Weinzierl
Affiliations:
Technische Universität München, Garching, Germany;Technische Universität München, Garching, Germany;Technische Universität München, Garching, Germany
Venue:
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Year:
2011

Citing 8
Cited 0

Load-sharing in heterogeneous systems via weighted factoring

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Dynamic Scheduling Parallel Loops with Variable Iterate Execution Times

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)

Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Intel threading building blocks

Intel threading building blocks
A component-based architecture for parallel multi-physics PDE simulation

Future Generation Computer Systems
A blocking strategy on multicore architectures for dynamically adaptive PDE solvers

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many multithreaded, grid-based, dynamically adaptive solvers for partial differential equations permanently have to traverse subgrids (patches) of different and changing sizes. The parallel efficiency of this traversal depends on the interplay of the patch size, the architecture used, the operations triggered throughout the traversal, and the grain size, i.e. the size of the subtasks the patch is broken into. We propose an oracle mechanism delivering grain sizes on-the-fly. It takes historical runtime measurements for different patch and grain sizes as well as the traverse's operations into account, and it yields reasonable speedups. Neither magic configuration settings nor an expensive pre-tuning phase are necessary. It is an autotuning approach.