Computer simulation using particles
Computer simulation using particles
Modern C++ design: generic programming and design patterns applied
Modern C++ design: generic programming and design patterns applied
On the elimination of numerical Cerenkov radiation in PIC simulations
Journal of Computational Physics
Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU
Journal of Parallel and Distributed Computing
0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Particle-in-cell simulations with charge-conserving current deposition on graphic processing units
Journal of Computational Physics
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters
Computer Science - Research and Development
Efficient GPU Implementation for Particle in Cell Algorithm
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
A software framework for the portable parallelization of particle-mesh simulations
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Journal of Computational Physics
Hi-index | 0.00 |
We present a particle-in-cell simulation of the relativistic Kelvin-Helmholtz Instability (KHI) that for the first time delivers angularly resolved radiation spectra of the particle dynamics during the formation of the KHI. This enables studying the formation of the KHI with unprecedented spatial, angular and spectral resolution. Our results are of great importance for understanding astrophysical jet formation and comparable plasma phenomena by relating the particle motion observed in the KHI to its radiation signature. The innovative methods presented here on the implementation of the particle-in-cell algorithm on graphic processing units can be directly adapted to any many-core parallelization of the particle-mesh method. With these methods we see a peak performance of 7.176 PFLOP/s (double-precision) plus 1.449 PFLOP/s (single-precision), an efficiency of 96% when weakly scaling from 1 to 18432 nodes, an efficiency of 68.92% and a speed up of 794 (ideal: 1152) when strongly scaling from 16 to 18432 nodes.