Portable and Efficient Parallel Computing Using the BSP Model
IEEE Transactions on Computers
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
The implementation of MPI-2 one-sided communication for the NEC SX-5
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Single sided MPI implementations for SUN MPIr
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
The Quadrics Network (QsNet): High-Performance Clustering Technology
HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
GASNet Specification, v1.1
Distributed Queue-Based Locking Using Advanced Network Features
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Analysis of implementation options for MPI-2 one-sided
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application
Proceedings of the 24th ACM International Conference on Supercomputing
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Adaptive strategy for one-sided communication in MPICH2
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Efficient and truly passive MPI-3 RMA using InfiniBand atomics
Proceedings of the 20th European MPI Users' Group Meeting
Enabling highly-scalable remote memory access programming with MPI-3 one sided
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A fast and resource-conscious MPI message queue mechanism for large-scale jobs
Future Generation Computer Systems
Hi-index | 0.00 |
As high-end computing systems continue to grow in scale, the performance that applications can achieve on such large scale systems depends heavily on their ability to avoid explicitly synchronized communication with other processes in the system. Accordingly, several modern and legacy parallel programming models (such as MPI, UPC, Global Arrays) have provided many programming constructs that enable implicit communication using one-sided communication operations. While MPI is the most widely used communication model for scientific computing, the usage of one-sided communication is restricted; this is mainly owing to the inefficiencies in current MPI implementations that internally rely on synchronization between processes even during one-sided communication, thus losing the potential of such constructs. In our previous work, we had utilized native one-sided communication primitives offered by high-speed networks such as InfiniBand (IB) to allow for true one-sided communication in MPI. In this paper, we extend this work to natively take advantage of one-sided atomic operations on cache-coherent multi-core/multi-processor architectures while still utilizing the benefits of networks such as IB. Specifically, we present a sophisticated hybrid design that uses locks that migrate between IB hardware atomics and multi-core CPU atomics to take advantage of both. We demonstrate the capability of our proposed design with a wide range of experiments illustrating its benefits in performance as well as its potential to avoid explicit synchronization.