Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
IBM Journal of Research and Development
IEEE Transactions on Parallel and Distributed Systems
Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Toward enhancing OpenMP's work-sharing directives
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Application codes in a variety of areas are being updated for performance on the latest architectures. In this paper we examine an application, which comes from magnetic fusion for performance acceleration with a particular emphasis on methods that are applicable for many/multicore and future architectural designs. We take an important magnetic fusion particle code that already includes several levels of parallelism including hybrid MPI combined with OpenMP. We study how to include new advanced hybrid models, which extend the applicability of OpenMP tasks and exploit multi-threaded MPI support to overlap communication and computation. Experiments carried out on Cray XT4 and XT5 machines resulting in a speed-up of up to 35% of the investigated GTS particle shifter kernel show the benefits and applicability of this approach.