Gyrokinetic particle simulation model
Journal of Computational Physics
Computer simulation using particles
Computer simulation using particles
Particle-in-cell simulation codes in High Performance Fortran
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Accelerating a paricle -in-cell simulation using a hybrid counting sort
Journal of Computational Physics
Plasma Physics Via Computer
ICCS '02 Proceedings of the International Conference on Computational Science-Part III
VORPAL: a versatile plasma simulation code
Journal of Computational Physics
Scientific Computations on Modern Parallel Vector Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
QUICKPIC: a highly efficient particle-in-cell code for modeling wakefield acceleration in plasmas
Journal of Computational Physics
IBM Journal of Research and Development
Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU
Journal of Parallel and Distributed Computing
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulations
Proceedings of the 23rd international conference on Supercomputing
Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Journal of Computational Physics
Scalable quasineutral solver for gyrokinetic simulation
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Kinetic turbulence simulations at extreme scale on leadership-class systems
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Radiative signatures of the relativistic Kelvin-Helmholtz instability
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Hi-index | 0.00 |
The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by the HPC community. In this work, we examine the efficient multicore optimization of GTC, a petascale gyrokinetic toroidal fusion code for studying plasma microturbulence in tokamak devices. For GTC's key computational components (charge deposition and particle push), we explore efficient parallelization strategies across a broad range of emerging multicore designs, including the recently-released Intel Nehalem-EX, the AMD Opteron Istanbul, and the highly multithreaded Sun UltraSparc T2+. We also present the first study on tuning gyrokinetic particle-in-cell (PIC) algorithms for graphics processors, using the NVIDIA C2050 (Fermi). Our work discusses several novel optimization approaches for gyrokinetic PIC, including mixed-precision computation, particle binning and decomposition strategies, grid replication, SIMDized atomic floating-point operations, and effective GPU texture memory utilization. Overall, we achieve significant performance improvements of 1.3-4.7x on these complex PIC kernels, despite the inherent challenges of data dependency and locality. Our work also points to several architectural and programming features that could significantly enhance PIC performance and productivity on next-generation architectures.