Hybrid MPI: efficient message passing for multi-core systems

Authors:
Andrew Friedley;Greg Bronevetsky;Torsten Hoefler;Andrew Lumsdaine
Affiliations:
Indiana University;Lawrence Livermore National Laboratory;ETH Zurich;Indiana University
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 13
Cited 0

Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Fast parallel algorithms for short-range molecular dynamics

Journal of Computational Physics
Optimizing threaded MPI execution on SMP clusters

ICS '01 Proceedings of the 15th international conference on Supercomputing
High performance RDMA-based MPI implementation over InfiniBand

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Protocol-Dependent Message-Passing Performance on Linux Clusters

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
LiMIC: Support for High-Performance MPI Intra-node Communication on Linux Cluster

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Test suite for evaluating performance of multithreaded MPI communication

Parallel Computing
Automatic MPI to AMPI program transformation using photran

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Leveraging MPI's one-sided communication interface for shared-memory programming

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
On the Effects of CPU Caches on MPI Point-to-Point Communications

CLUSTER '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing
KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

Journal of Parallel and Distributed Computing
Ownership passing: efficient distributed memory programming on multi-core systems

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
NUMA-aware shared-memory collective communication for MPI

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-core shared memory architectures are ubiquitous in both High-Performance Computing (HPC) and commodity systems because they provide an excellent trade-off between performance and programmability. MPI's abstraction of explicit communication across distributed memory is very popular for programming scientific applications. Unfortunately, OS-level process separations force MPI to perform unnecessary copying of messages within shared memory nodes. This paper presents a novel approach that transparently shares memory across MPI processes executing on the same node, allowing them to communicate like threaded applications. While prior work explored thread-based MPI libraries, we demonstrate that this approach is impractical and performs poorly in practice. We instead propose a novel process-based approach that enables shared memory communication and integrates with existing MPI libraries and applications without modifications. Our protocols for shared memory message passing exhibit better performance and reduced cache footprint. Communication speedups of more than 26% are demonstrated for two applications.