Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems

Authors:
Kamesh Madduri;Khaled Z. Ibrahim;Samuel Williams;Eun-Jin Im;Stephane Ethier;John Shalf;Leonid Oliker
Affiliations:
NERSC/CRD, Lawrence Berkeley National Laboratory, Berkeley;NERSC/CRD, Lawrence Berkeley National Laboratory, Berkeley;NERSC/CRD, Lawrence Berkeley National Laboratory, Berkeley;Kookmin University, Seoul, Korea;Princeton Plasma Physics Laboratory, Princeton;NERSC/CRD, Lawrence Berkeley National Laboratory, Berkeley;NERSC/CRD, Lawrence Berkeley National Laboratory, Berkeley
Venue:
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2011

Citing 15
Cited 3

Gyrokinetic particle simulation model

Journal of Computational Physics
Computer simulation using particles

Computer simulation using particles
Particle-in-cell simulation codes in High Performance Fortran

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Accelerating a paricle -in-cell simulation using a hybrid counting sort

Journal of Computational Physics
OSIRIS: A Three-Dimensional, Fully Relativistic Particle in Cell Code for Modeling Plasma Based Accelerators

ICCS '02 Proceedings of the International Conference on Computational Science-Part III
VORPAL: a versatile plasma simulation code

Journal of Computational Physics
Scientific Computations on Modern Parallel Vector Systems

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
QUICKPIC: a highly efficient particle-in-cell code for modeling wakefield acceleration in plasmas

Journal of Computational Physics
Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas

IBM Journal of Research and Development
Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU

Journal of Parallel and Distributed Computing
0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulations

Proceedings of the 23rd international conference on Supercomputing
Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The Scalable Heterogeneous Computing (SHOC) benchmark suite

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms

Parallel Computing

An efficient mixed-precision, hybrid CPU-GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm

Journal of Computational Physics
Kinetic turbulence simulations at extreme scale on leadership-class systems

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The gyrokinetic Particle-in-Cell (PIC) method is a critical computational tool enabling petascale fusion simulation research. In this work, we present novel multi- and manycore-centric optimizations to enhance performance of GTC, a PIC-based production code for studying plasma microturbulence in tokamak devices. Our optimizations encompass all six GTC sub-routines and include multi-level particle and grid decompositions designed to improve multi-node parallel scaling, particle binning for improved load balance, GPU acceleration of key subroutines, and memory-centric optimizations to improve single-node scaling and reduce memory utilization. The new hybrid MPI-OpenMP and MPI-OpenMP-CUDA GTC versions achieve up to a 2x speedup over the production Fortran code on four parallel systems --- clusters based on the AMD Magny-Cours, Intel Nehalem-EP, IBM BlueGene/P, and NVIDIA Fermi architectures. Finally, strong scaling experiments provide insight into parallel scalability, memory utilization, and programmability trade-offs for large-scale gyrokinetic PIC simulations, while attaining a 1.6× speedup on 49,152 XE6 cores.