Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI/RT --- An Emerging Standard for High-Performance Real-Time Systems
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences - Volume 3
A Framework for Collective Personalized Communication
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
International Journal of High Performance Computing Applications
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Issues in developing a thread-safe MPI implementation
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Optimizing a conjugate gradient solver with non-blocking collective operations
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Leveraging non-blocking collective communication in high-performance applications
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Sparse Non-blocking Collectives in Quantum Mechanical Calculations
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Towards Efficient MapReduce Using MPI
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hiding latency in Coarray Fortran 2.0
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
pupyMPI - MPI implemented in pure python
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Trace-based performance analysis for the petascale simulation code FLASH
International Journal of High Performance Computing Applications
Distributed adaptive routing for big-data applications running on data center networks
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
Concurrent programming constructs for parallel MPI applications
The Journal of Supercomputing
Hi-index | 0.00 |
In this paper we make the case for adding standard nonblocking collective operations to the MPI standard. The nonblocking point-to-point and blocking collective operations currently defined by MPI provide important performance and abstraction benefits. To allow these benefits to be simultaneously realized, we present an application programming interface for non-blocking collective operations in MPI. Microbenchmark and application-based performance results demonstrate that non-blocking collective operations offer not only improved convenience, but improved performance as well, when compared to manual use of threads with blocking collectives.