Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
Design of High Performance MVAPICH2: MPI2 over InfiniBand
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
STAR-MPI: self tuned adaptive routines for MPI collective operations
Proceedings of the 20th annual international conference on Supercomputing
A study of process arrival patterns for MPI collective operations
Proceedings of the 21st annual international conference on Supercomputing
Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the 22nd annual international conference on Supercomputing
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Optimization of All-to-All Communication on the Blue Gene/L Supercomputer
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
Performance evaluation and optimization of nested high resolution weather simulations
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Optimization of MPI_Allreduce on the blue Gene/Q supercomputer
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
Different programming paradigms utilize a variety of collective communication operations, often with different semantics. We present the component collective messaging interface (CCMI) that can support asynchronous non-blocking collectives and is extensible to different programming paradigms and architectures. CCMI is designed with components written in the C++ programming language, allowing it to be reusable and extendible. Collective algorithms are embodied in topological schedules and executors that execute them. Portability across architectures is enabled by the multisend data movement component. CCMI includes a programming language adaptor used to implement different APIs with different semantics for different paradigms. We study the effectiveness of CCMI on 16K nodes of Blue Gene/P machine and evaluate its performance for the barrier, broadcast, and allreduce collective operations and several application benchmarks. We also present the performance of the barrier collective on the Abe Infiniband cluster.