The design and implementation of MPI collective operations for clusters in long-and-fast networks

Authors:
Motohiko Matsuda;Tomohiro Kudoh;Yuetsu Kodama;Ryousei Takano;Yutaka Ishikawa
Affiliations:
Grid Technology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan;Grid Technology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan;Grid Technology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan;Grid Technology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan;The University of Tokyo, Tokyo, Japan
Venue:
Cluster Computing
Year:
2008

Citing 6
Cited 2

Building a high-performance collective communication library

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Evaluation of MPI Implementations on Grid-connected Clusters using an Emulated WAN Environment

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Bandwidth-Efficient Collective Communication for Clustered Wide Area Systems

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Balanced Multicasting: High-throughput Communication for Grid Applications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
GNET-1: gigabit Ethernet network testbed

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Collective communication on architectures that support simultaneous communication over multiple links

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Performance evaluation of broadcast and global combine operations in all-port wormhole-routed OTIS-Mesh interconnection networks

Cluster Computing
MPI-based parallel synchronous vector evaluated particle swarm optimization for multi-objective design optimization of composite structures

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.03

Visualization

Abstract

Several MPI systems for Grid environment, in which clusters are connected by wide-area networks, have been proposed. However, the algorithms of collective communication in such MPI systems assume relatively low bandwidth wide-area networks, and they are not designed for the fast wide-area networks that are becoming available. On the other hand, for cluster MPI systems, a bcast algorithm by van de Geijn, et al. and an allreduce algorithm by Rabenseifner have been proposed, which are efficient in a high bi-section bandwidth environment. We modify those algorithms so as to effectively utilize fast wide-area inter-cluster networks and to control the number of nodes which can transfer data simultaneously through wide-area networks to avoid congestion. We confirmed the effectiveness of the modified algorithms by experiments using a 10 Gbps emulated WAN environment. The environment consists of two clusters, where each cluster consists of nodes with 1 Gbps Ethernet links and a switch with a 10 Gbps upper link. The two clusters are connected through a 10 Gbps WAN emulator which can insert latency. In a 10 millisecond latency environment, when the message size is 32 MB, the proposed bcast and allreduce are 1.6 and 3.2 times faster, respectively, than the algorithms used in existing MPI systems for Grid environment.