A study of process arrival patterns for MPI collective operations

Authors:
Ahmad Faraj;Pitch Patarasuk;Xin Yuan
Affiliations:
IBM Corporation, Rochester, MN;Florida State University, Tallahassee, FL;Florida State University, Tallahassee, FL
Venue:
Proceedings of the 21st annual international conference on Supercomputing
Year:
2007

Citing 10
Cited 5

Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
Static Communications in Parallel Scientific Propgrams

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
An empirical performance evaluation of scalable scientific applications

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Statistical Analysis of Message Passing Programs to Guide Computer Design

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
Pipelining Broadcasts on Heterogeneous Platforms

IEEE Transactions on Parallel and Distributed Systems
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Analysis of MPI Collective Operations

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Automatic generation and tuning of MPI collective communication routines

Proceedings of the 19th annual international conference on Supercomputing
STAR-MPI: self tuned adaptive routines for MPI collective operations

Proceedings of the 20th annual international conference on Supercomputing
Pipelined broadcast on ethernet switched clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations

Proceedings of the 23rd international conference on Supercomputing
Architecture of the Component Collective Messaging Interface

International Journal of High Performance Computing Applications
Contention-free communication scheduling for group communication in data parallelism

OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
ScalaExtrap: trace-based communication extrapolation for spmd programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
ScalaExtrap: Trace-based communication extrapolation for SPMD programs

ACM Transactions on Programming Languages and Systems (TOPLAS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Process arrival pattern, which denotes the timing when different processes arrive at an MPI collective operation, can have a significant impact on the performance of the operation. In this work, we characterize the process arrival patterns in a set of MPI programs on two common cluster platforms, use a micro-benchmark to study the process arrival patterns in MPI programs with balanced loads, and investigate the impacts of the process arrival pattern on collective algorithms. Our results show that (1) the differences between the times when different processes arrive at a collective operation are usually sufficiently large to affect the performance; (2) application developers in general cannot effectively control the process arrival patterns in their MPI programs in cluster environments: balancing loads at the application level does not balance the process arrival patterns; and (3) the performance of the collective communication algorithms is sensitive to process arrival patterns. These results indicate that the process arrival pattern is an important factor that must be taken into consideration in developing and optimizing MPI collective routines. We propose a scheme that achieves high performance with different process arrival patterns, and demonstrate that by explicitly considering process arrival pattern, more efficient MPI collective routines than the current ones can be obtained.