A study of process arrival patterns for MPI collective operations

Authors:
Ahmad Faraj;Pitch Patarasuk;Xin Yuan
Affiliations:
Blue Gene Software Development, IBM Corporation, Rochester, MN;Department of Computer Science, Florida State University, Tallahassee, FL;Department of Computer Science, Florida State University, Tallahassee, FL
Venue:
International Journal of Parallel Programming
Year:
2008

Citing 17
Cited 4

A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Algorithms for Supporting Compiled Communication

IEEE Transactions on Parallel and Distributed Systems
Static Communications in Parallel Scientific Propgrams

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
An empirical performance evaluation of scalable scientific applications

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A bandwidth latency tradeoff for broadcast and reduction

Information Processing Letters
Statistical Analysis of Message Passing Programs to Guide Computer Design

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Analysis of MPI Collective Operations

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Automatic generation and tuning of MPI collective communication routines

Proceedings of the 19th annual international conference on Supercomputing
An MPI prototype for compiled communication on Ethernet switched clusters

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
STAR-MPI: self tuned adaptive routines for MPI collective operations

Proceedings of the 20th annual international conference on Supercomputing
A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters

IEEE Transactions on Parallel and Distributed Systems
Bandwidth efficient all-to-all broadcast on switched clusters

International Journal of Parallel Programming
Pipelined broadcast on ethernet switched clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Techniques for pipelined broadcast on ethernet switched clusters

Journal of Parallel and Distributed Computing
Bandwidth optimal all-reduce algorithms for clusters of workstations

Journal of Parallel and Distributed Computing
Process Arrival Pattern and Shared Memory Aware Alltoall on InfiniBand

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Neighborhood communication paradigm to increase scalability in large-scale dynamic scientific applications

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Process arrival pattern, which denotes the timing when different processes arrive at an MPI collective operation, can have a significant impact on the performance of the operation. In this work, we characterize the process arrival patterns in a set of MPI programs on two common cluster platforms, use a micro-benchmark to study the process arrival patterns in MPI programs with balanced loads, and investigate the impacts of different process arrival patterns on collective algorithms. Our results show that (1) the differences between the times when different processes arrive at a collective operation are usually sufficiently large to affect the performance; (2) application developers in general cannot effectively control the process arrival patterns in their MPT programs in the cluster environment: balancing loads at the application level does not balance the process arrival patterns; and (3) the performance of collective communication algorithms is sensitive to process arrival patterns. These results indicate that process arrival pattern is an important factor that must be taken into consideration in developing and optimizing MPI collective routines. We propose a scheme that achieves high performance with different process arrival patterns, and demonstrate that by explicitly considering process arrival pattern, more efficient MPI collective routines than the current ones can be obtained.