Adaptive connection management for scalable MPI over InfiniBand

Authors:
Weikuan Yu;Qi Gao;Dhabaleswar K. Panda
Affiliations:
Network-Based Computing Lab, Dept. of Computer Sci. & Engineering, The Ohio State University;Network-Based Computing Lab, Dept. of Computer Sci. & Engineering, The Ohio State University;Network-Based Computing Lab, Dept. of Computer Sci. & Engineering, The Ohio State University
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 11
Cited 5

Architectural requirements and scalability of the NAS parallel benchmarks

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Components and interfaces of a process management system for parallel programs

Parallel Computing - Clusters and computational grids for scientific computing
Scalable parallel application launch on Cplant™

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
STORM: lightning-fast resource management

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Impact of On-Demand Connection Management in MPI over VIA

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
High performance RDMA-based MPI implementation over infiniBand

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
The open run-time environment (OpenRTE): a transparent multi-cluster environment for high-performance computing

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Fast and scalable startup of MPI programs in infiniband clusters

HiPC'04 Proceedings of the 11th international conference on High Performance Computing

High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters

Proceedings of the 21st annual international conference on Supercomputing
Impact of Node Level Caching in MPI Job Launch Mechanisms

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Efficient On-Demand Connection Management Mechanisms with PGAS Models over InfiniBand

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
DirectPath: high performance and energy efficient platform I/O architecture for content intensive usages

Proceedings of the 3rd International Conference on Future Energy Systems: Where Energy, Computing and Communication Meet

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supporting scalable and efficient parallel programs is a major challenge in parallel computing with the widespread adoption of large-scale computer clusters and supercomputers. One of the pronounced scalability challenges is the management of connections between parallel processes, especially over connection-oriented interconnects such as VIA and InfiniBand. In this paper, we take on the challenge of designing efficient connection management for parallel programs over InfiniBand clusters. We propose adaptive connection management (ACM) to dynamically control the establishment of InfiniBand reliable connections (RC) based on the communication frequency between MPI processes. We have investigated two different ACM algorithms: an on-demand algorithm that starts with no InfiniBand RC connections; and a partial static algorithm with only 2 * logN number of InfiniBand RC connections initially. We have designed and implemented both ACM algorithms in MVAPICH to study their benefits. Two mechanisms have been exploited for the establishment of new RC connections: one using InfiniBand unreliable datagramand the other using InfiniBand connection management. For both mechanisms, MPI communication issues, such as progress rules, reliability and race conditions are handled to ensure efficient and light-weight connection management. Our experimental results indicate that ACM algorithms can benefit parallel programs in terms of the process initiation time, the number of active connections, and the resource usage. For parallel programs on a 16-node cluster, they can reduce the process initiation time by 15% and the initial memory usage by 18%.