International Journal of Parallel Programming
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Fast, contention-free combining tree barriers for shared-memory multiprocessors
International Journal of Parallel Programming
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
COMB: A Portable Benchmark Suite for Assessing MPI Overlap
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
MPI/RT --- An Emerging Standard for High-Performance Real-Time Systems
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences - Volume 3
A Framework for Collective Personalized Communication
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
International Journal of High Performance Computing Applications
A Practical Approach to the Rating of Barrier Algorithms Using the LogP Model and Open MPI
ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
High performance RDMA-based MPI implementation over infiniBand
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Operating system issues for petascale systems
ACM SIGOPS Operating Systems Review
Assessing Single-Message and Multi-Node Communication Performance of InfiniBand
PARELEC '06 Proceedings of the international symposium on Parallel Computing in Electrical Engineering
The development and integration of a distributed 3D FFT for a cluster of workstations
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Fast barrier synchronization for InfiniBand™
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A case for non-blocking collective operations
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Issues in developing a thread-safe MPI implementation
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Optimizing a conjugate gradient solver with non-blocking collective operations
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Analysis of the memory registration process in the mellanox infiniband software stack
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Advanced collective communication in aspen
Proceedings of the 22nd annual international conference on Supercomputing
Leveraging non-blocking collective communication in high-performance applications
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Architecture of the Component Collective Messaging Interface
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Sparse Non-blocking Collectives in Quantum Mechanical Calculations
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Communication Optimization for Medical Image Reconstruction Algorithms
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MPI-aware compiler optimizations for improving communication-computation overlap
Proceedings of the 23rd international conference on Supercomputing
Towards Efficient MapReduce Using MPI
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Scalable communication protocols for dynamic sparse data exchange
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Architecture of the Component Collective Messaging Interface
International Journal of High Performance Computing Applications
AM++: a generalized active message framework
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Design of kernel-level asynchronous collective communication
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Network offloaded hierarchical collectives using ConnectX-2's CORE-Direct capabilities
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Overlapping communication with computation using OpenMP tasks on the GTS magnetic fusion code
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
Kanor: a declarative language for explicit communication
PADL'11 Proceedings of the 13th international conference on Practical aspects of declarative languages
Active pebbles: parallel programming for data-driven applications
Proceedings of the international conference on Supercomputing
Hiding latency in Coarray Fortran 2.0
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Kernel-based offload of collective operations: implementation, evaluation and lessons learned
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Design and evaluation of nonblocking collective I/O operations
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
pupyMPI - MPI implemented in pure python
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Writing parallel libraries with MPI - common practice, issues, and extensions
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Composable, non-blocking collective operations on power7 IH
Proceedings of the 26th ACM international conference on Supercomputing
Delta Send-Recv for Dynamic Pipelining in MPI Programs
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MPI runtime error detection with MUST: advances in deadlock detection
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimization principles for collective neighborhood communications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A case for standard non-blocking collective operations
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Parallel simulation on supercomputers
Proceedings of the Winter Simulation Conference
The impact of system design parameters on application noise sensitivity
Cluster Computing
Bandwidth-optimal all-to-all exchanges in fat tree networks
Proceedings of the 27th international ACM conference on International conference on supercomputing
Expressing graph algorithms using generalized active messages
Proceedings of the 27th international ACM conference on International conference on supercomputing
Parallel bucket-brigade communication interface for scientific applications
Proceedings of the 20th European MPI Users' Group Meeting
First observations using nonblocking collectives in MVAPICH
Proceedings of the 20th European MPI Users' Group Meeting
Designing and auto-tuning parallel 3-D FFT for computation-communication overlap
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
MPI runtime error detection with MUST: Advances in deadlock detection
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Collective operations and non-blocking point-to-point operations have always been part of MPI. Although non-blocking collective operations are an obvious extension to MPI, there have been no comprehensive studies of this functionality. In this paper we present LibNBC, a portable high-performance library for implementing non-blocking collective MPI communication operations. LibNBC provides non-blocking versions of all MPI collective operations, is layered on top of MPI-1, and is portable to nearly all parallel architectures. To measure the performance characteristics of our implementation, we also present a microbenchmark for measuring both latency and overlap of computation and communication. Experimental results demonstrate that the blocking performance of the collective operations in our library is comparable to that of collective operations in other high-performance MPI implementations. Our library introduces a very low overhead between the application and the underlying MPI and thus, in conjunction with the potential to overlap communication with computation, offers the potential for optimizing real-world applications.