Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms

Authors:
Kamesh Madduri;Eun-Jin Im;Khaled Z. Ibrahim;Samuel Williams;StéPhane Ethier;Leonid Oliker
Affiliations:
Computational Research Division, Lawrence Berkeley National Laboratory, CA 94720, United States;School of Computer Science, Kookmin University, Seoul 136-702, Republic of Korea;Computational Research Division, Lawrence Berkeley National Laboratory, CA 94720, United States;Computational Research Division, Lawrence Berkeley National Laboratory, CA 94720, United States;Princeton Plasma Physics Laboratory, Princeton, NJ 08543, United States;Computational Research Division, Lawrence Berkeley National Laboratory, CA 94720, United States
Venue:
Parallel Computing
Year:
2011

Citing 15
Cited 7

Gyrokinetic particle simulation model

Journal of Computational Physics
Computer simulation using particles

Computer simulation using particles
Particle-in-cell simulation codes in High Performance Fortran

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Accelerating a paricle -in-cell simulation using a hybrid counting sort

Journal of Computational Physics
Plasma Physics Via Computer

Plasma Physics Via Computer
OSIRIS: A Three-Dimensional, Fully Relativistic Particle in Cell Code for Modeling Plasma Based Accelerators

ICCS '02 Proceedings of the International Conference on Computational Science-Part III
VORPAL: a versatile plasma simulation code

Journal of Computational Physics
Scientific Computations on Modern Parallel Vector Systems

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
QUICKPIC: a highly efficient particle-in-cell code for modeling wakefield acceleration in plasmas

Journal of Computational Physics
Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas

IBM Journal of Research and Development
Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU

Journal of Parallel and Distributed Computing
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulations

Proceedings of the 23rd international conference on Supercomputing
Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
An efficient mixed-precision, hybrid CPU-GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm

Journal of Computational Physics
Scalable quasineutral solver for gyrokinetic simulation

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
A portable OpenCL implementation of generic particle-mesh and mesh-particle interpolation in 2D and 3D

Parallel Computing
Kinetic turbulence simulations at extreme scale on leadership-class systems

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Radiative signatures of the relativistic Kelvin-Helmholtz instability

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by the HPC community. In this work, we examine the efficient multicore optimization of GTC, a petascale gyrokinetic toroidal fusion code for studying plasma microturbulence in tokamak devices. For GTC's key computational components (charge deposition and particle push), we explore efficient parallelization strategies across a broad range of emerging multicore designs, including the recently-released Intel Nehalem-EX, the AMD Opteron Istanbul, and the highly multithreaded Sun UltraSparc T2+. We also present the first study on tuning gyrokinetic particle-in-cell (PIC) algorithms for graphics processors, using the NVIDIA C2050 (Fermi). Our work discusses several novel optimization approaches for gyrokinetic PIC, including mixed-precision computation, particle binning and decomposition strategies, grid replication, SIMDized atomic floating-point operations, and effective GPU texture memory utilization. Overall, we achieve significant performance improvements of 1.3-4.7x on these complex PIC kernels, despite the inherent challenges of data dependency and locality. Our work also points to several architectural and programming features that could significantly enhance PIC performance and productivity on next-generation architectures.