GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems
SIAM Journal on Scientific and Statistical Computing
CGS, a fast Lanczos-type solver for nonsymmetric linear systems
SIAM Journal on Scientific and Statistical Computing
SIAM Journal on Scientific and Statistical Computing
LogGP: incorporating long messages into the LogP model for parallel computation
Journal of Parallel and Distributed Computing
Multigrid
MPI/RT --- An Emerging Standard for High-Performance Real-Time Systems
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences - Volume 3
An Evaluation of Current High-Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
A Framework for Collective Personalized Communication
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Send-receive considered harmful: Myths and realities of message passing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
Optimizing All-to-All Collective Communication by Exploiting Concurrency in Modern Networks
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Fast barrier synchronization for InfiniBand™
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A case for non-blocking collective operations
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Advanced collective communication in aspen
Proceedings of the 22nd annual international conference on Supercomputing
Leveraging non-blocking collective communication in high-performance applications
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Sparse Non-blocking Collectives in Quantum Mechanical Calculations
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Overlapping communication and computation by using a hybrid MPI/SMPSs approach
Proceedings of the 24th ACM International Conference on Supercomputing
Design and evaluation of nonblocking collective I/O operations
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Towards autotuning by alternating communication methods
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
A parallel solution of large-scale heat equation based on distributed memory hierarchy system
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Towards autotuning by alternating communication methods
ACM SIGMETRICS Performance Evaluation Review
First observations using nonblocking collectives in MVAPICH
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
This paper presents a case study that analyzes the suitability and usage of non-blocking collective operations in parallel applications. As with their point-to-point counterparts, non-blocking collective operations provide the ability to overlap communication with computation and to avoid unnecessary synchronization. These operations are provided for MPI programs with LibNBC, a portable low-overhead implementation of non-blocking collective operations built on MPI-1. The straightforward applicability of the LibNBC is demonstrated by incorporating non-blocking collective operations into a parallel conjugate gradient solver. Although only minor changes are required to use them, non-blocking collective operations allow most of the communication costs to be hidden and provide performance improvements of up to 34%. We also show that, because of overlap, there is no significant performance difference between Gigabit Ethernet and InfiniBandTM for special cases of our calculation.