Improving MPI communication overlap with collaborative polling

Authors:
Sylvain Didelot;Patrick Carribault;Marc Pérache;William Jalby
Affiliations:
Exascale Computing Research Center, Versailles, France and Université de Versailles Saint-Quentin-en-Yvelines (UVSQ), Versailles, France;Exascale Computing Research Center, Versailles, France and Université de Versailles Saint-Quentin-en-Yvelines (UVSQ), Versailles, France and CEA, DAM, DIF, Arpajon, France F-91297;Exascale Computing Research Center, Versailles, France and Université de Versailles Saint-Quentin-en-Yvelines (UVSQ), Versailles, France and CEA, DAM, DIF, Arpajon, France F-91297;Exascale Computing Research Center, Versailles, France and Université de Versailles Saint-Quentin-en-Yvelines (UVSQ), Versailles, France
Venue:
Computing
Year:
2014

Citing 14
Cited 0

Optimizing threaded MPI execution on SMP clusters

ICS '01 Proceedings of the 15th international conference on Supercomputing
Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications

International Journal of High Performance Computing Applications
Implementation and design analysis of a network messaging module using virtual interface architecture

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Optimizing bandwidth limited problems using one-sided communication and overlap

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The Impact of Application's Micro-Imbalance on the Communication-Computation Overlap

PDP '11 Proceedings of the 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing
Thread-local storage extension to support thread-based MPI/OpenMP applications

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Performance evaluation of thread-based MPI in shared memory

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Test suite for evaluating performance of MPI implementations that support MPI_THREAD_MULTIPLE

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Improving MPI communication overlap with collaborative polling

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Added Concurrency to Improve MPI Performance on Multicore

ICPP '12 Proceedings of the 2012 41st International Conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rise of parallel applications complexity, the needs in term of computational power are continually growing. Recent trends in High-Performance Computing (HPC) have shown that improvements in single-core performance will not be sufficient to face the challenges of an exascale machine: we expect an enormous growth of the number of cores as well as a multiplication of the data volume exchanged across compute nodes. To scale applications up to Exascale, the communication layer has to minimize the time while waiting for network messages. This paper presents a message progression based on Collaborative Polling which allows an efficient auto-adaptive overlapping of communication phases by performing computing. This approach is new as it increases the application overlap potential without introducing overheads of a threaded message progression. We designed our approch for Infiniband into a thread-based MPI runtime called MPC. We evaluate the gain from Collaborative Polling on the NAS Parallel Benchmarks and three scientific applications, where we show significant improvements in communication times up to a factor of 2.