Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications

Authors:
Meng-Shiou Wu;Ricky A. Kendall;Kyle Wright;Zhao Zhang
Affiliations:
Iowa State University Laboratory, Ames Laboratory;Iowa State University Ames, Iowa, USA;Iowa State University Ames, Iowa, USA;Iowa State University
Venue:
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Year:
2005

Citing 13
Cited 4

LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
LogGP: incorporating long messages into the LogP model for parallel computation

Journal of Parallel and Distributed Computing
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of MPI collectives on clusters of large-scale SMP's

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Building a high-performance collective communication library

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Fast Measurement of LogP Parameters for Message Passing Platforms

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
An Evaluation of Current High-Performance Networks

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Bandwidth-Efficient Collective Communication for Clustered Wide Area Systems

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Optimizing Collective Communications on SMP Clusters

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Towards an Accurate Model for Collective Communications

International Journal of High Performance Computing Applications

Optimizing communication overlap for high-speed networks

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Recursive multi-factoring algorithm for MPI_Allreduce

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Constructing a performance database for large-scale quantum chemistry packages

Proceedings of the 2008 Spring simulation multiconference
Performance Modeling and Analysis of a Massively Parallel Direct - Part 2

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

On SMP clusters, mixed mode collective MPI communications, which use shared memory communications within SMP nodes and point-to-point communications between SMP nodes, are more efficient than conventional implementations. In a previous study, we proposed several new methods that made mixed mode collective communications significantly faster than the pure point-to-point ones. However, the optimal performance required the tuning of many parameters, which was done by testing every possible setting and was very time consuming. In this study, we propose a new performance model that considers the special characteristics of mixed mode collective communications. The model provides good predictions to reduce most settings without testing by execution. It considers both shared-memory and point-to-point communications, while existing performance models only consider the point-to-point ones. Based on this model, we develop a number of tuning strategies that reduce the overall tuning time to only 10% of previous tuning time.