Overlapping communication with computation using OpenMP tasks on the GTS magnetic fusion code

Authors:
Robert Preissl;Alice Koniges;Stephan Ethier;Weixing Wang;Nathan Wichmann
Affiliations:
(Correspd. Tel.: +1 510 486 6421/ Fax: +1 510 486 4316/ E-mail: rpreissl@lbl. gov) Lawrence Berkeley National Laboratory, Berkeley, CA, USA;Lawrence Berkeley National Laboratory, Berkeley, CA, USA;Princeton Plasma Physics Laboratory, Princeton, NJ, USA;Princeton Plasma Physics Laboratory, Princeton, NJ, USA;Cray Inc., St. Paul, MN, USA
Venue:
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
Year:
2010

Citing 5
Cited 1

Implementation and performance analysis of non-blocking collective operations for MPI

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas

IBM Journal of Research and Development
The Design of OpenMP Tasks

IEEE Transactions on Parallel and Distributed Systems
Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Toward enhancing OpenMP's work-sharing directives

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing

Multithreaded Global Address Space Communication Techniques for Gyrokinetic Fusion Applications on Ultra-Scale Platforms

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Application codes in a variety of areas are being updated for performance on the latest architectures. In this paper we examine an application, which comes from magnetic fusion for performance acceleration with a particular emphasis on methods that are applicable for many/multicore and future architectural designs. We take an important magnetic fusion particle code that already includes several levels of parallelism including hybrid MPI combined with OpenMP. We study how to include new advanced hybrid models, which extend the applicability of OpenMP tasks and exploit multi-threaded MPI support to overlap communication and computation. Experiments carried out on Cray XT4 and XT5 machines resulting in a speed-up of up to 35% of the investigated GTS particle shifter kernel show the benefits and applicability of this approach.