Optimizing MPI one sided communication on multi-core infiniband clusters using shared memory backed windows

Authors:
Sreeram Potluri;Hao Wang;Vijay Dhanraj;Sayantan Sur;Dhabaleswar K. Panda
Affiliations:
Department of Computer Science and Engineering, The Ohio State University;Department of Computer Science and Engineering, The Ohio State University;Department of Computer Science and Engineering, The Ohio State University;Department of Computer Science and Engineering, The Ohio State University;Department of Computer Science and Engineering, The Ohio State University
Venue:
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Year:
2011

Citing 7
Cited 0

Single sided MPI implementations for SUN MPIr

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
SPLASH: Stanford parallel applications for shared-memory

SPLASH: Stanford parallel applications for shared-memory
Optimizing the Synchronization Operations in Message Passing Interface One-Sided Communication

International Journal of High Performance Computing Applications
High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Lightweight kernel-level primitives for high-performance MPI intra-node communication over multi-core systems

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Natively Supporting True One-Sided Communication in MPI on Multi-core Systems with InfiniBand

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Analysis of implementation options for MPI-2 one-sided

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Message Passing Interface (MPI) has been very popular for programming parallel scientific applications. As the multi-core architectures have become prevalent, a major question that has emerged is about the use of MPI within a compute node and its impact on communication costs. The one-sided communication interface in MPI provides a mechanism to reduce communication costs by removing matching requirements of the send/receive model. The MPI standard provides the flexibility to allocate memory windows backed by shared memory. However, state-of-the-art open-source MPI libraries do not leverage this optimization opportunity for commodity clusters. In this paper, we present a design and implementation of intra-node MPI one-sided interface using shared memory backed windows on multi-core clusters. We use MVAPICH2 MPI library for design, implementation and evaluation. Micro-benchmark evaluation shows that the new design can bring up to 85% improvement in Put, Get and Accumulate latencies, with passive synchronization mode. The band width performance of Put and Get improves by 64% and 42%, respectively. Splash LU benchmark shows an improvement of up to 55% with the new design on 32 core Magny-cours node. It shows similar improvement on a 12 core Westmere node. The mean BFS time in Graph500 reduces by 39% and 77% on Magny-cours and Westmere nodes, respectively.