Performance analysis of MPI collective operations

Authors:
Jelena Pješivac-Grbović;Thara Angskun;George Bosilca;Graham E. Fagg;Edgar Gabriel;Jack J. Dongarra
Affiliations:
Innovative Computing Laboratory, Computer Science Department, University of Tennessee, Knoxville, USA 37996-3450;Innovative Computing Laboratory, Computer Science Department, University of Tennessee, Knoxville, USA 37996-3450;Innovative Computing Laboratory, Computer Science Department, University of Tennessee, Knoxville, USA 37996-3450;Innovative Computing Laboratory, Computer Science Department, University of Tennessee, Knoxville, USA 37996-3450;Department of Computer Science, University of Houston, Houston, USA 77204-3010;Innovative Computing Laboratory, Computer Science Department, University of Tennessee, Knoxville, USA 37996-3450
Venue:
Cluster Computing
Year:
2007

Citing 14
Cited 20

LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The communication challenge for MPP: Intel Paragon and Meiko CS-2

Parallel Computing
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Network performance-aware collective communication for clustered wide-area systems

Parallel Computing - Clusters and computational grids for scientific computing
Introduction to Parallel Computing

Introduction to Parallel Computing
Assessing Fast Network Interfaces

IEEE Micro
Fast Measurement of LogP Parameters for Message Passing Platforms

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
An Evaluation of Current High-Performance Networks

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Efficient implementation of reduce-scatter in MPI

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
On optimizing collective communication

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing

Advanced collective communication in aspen

Proceedings of the 22nd annual international conference on Supercomputing
A Tool for Optimizing Runtime Parameters of Open MPI

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Accurate and Efficient Estimation of Parameters of Heterogeneous Communication Performance Models

International Journal of High Performance Computing Applications
Two-tree algorithms for full bandwidth broadcast, reduction and scan

Parallel Computing
Modeling advanced collective communication algorithms on cell-based systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Accurate Heterogeneous Communication Models and a Software Tool for Their Efficient Estimation

International Journal of High Performance Computing Applications
Performance evaluation of broadcast and global combine operations in all-port wormhole-routed OTIS-Mesh interconnection networks

Cluster Computing
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Two algorithms of irregular scatter/gather operations for heterogeneous platforms

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Design of efficient Java message-passing collectives on multi-core clusters

The Journal of Supercomputing
Cosmic microwave background map-making at the petascale and beyond

Proceedings of the international conference on Supercomputing
High-performance high-resolution semi-Lagrangian tracer transport on a sphere

Journal of Computational Physics
Improving communication performance in dense linear algebra via topology aware collectives

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
An overview of CMPI: network performance aware MPI in the cloud

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Automatic performance optimization of the discrete fourier transform on distributed memory computers

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Improving large graph processing on partitioned graphs in the cloud

Proceedings of the Third ACM Symposium on Cloud Computing
The impact of system design parameters on application noise sensitivity

Cluster Computing
LibWater: heterogeneous distributed computing made easy

Proceedings of the 27th international ACM conference on International conference on supercomputing
Bandwidth-optimal all-to-all exchanges in fat tree networks

Proceedings of the 27th international ACM conference on International conference on supercomputing
On the performance of concurrent transfers in collective algorithms

Proceedings of the 20th European MPI Users' Group Meeting

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous studies of application usage show that the performance of collective communications are critical for high-performance computing. Despite active research in the field, both general and feasible solution to the optimization of collective communication problem is still missing.In this paper, we analyze and attempt to improve intra-cluster collective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP, to collective operations. We compare the predictions from models against the experimentally gathered data and using these results, construct optimal decision function for broadcast collective. We quantitatively compare the quality of the model-based decision functions to the experimentally-optimal one. Additionally, in this work, we also introduce a new form of an optimized tree-based broadcast algorithm, splitted-binary.Our results show that all of the models can provide useful insights into various aspects of the different algorithms as well as their relative performance. Still, based on our findings, we believe that the complete reliance on models would not yield optimal results. In addition, our experimental results have identified the gap parameter as being the most critical for accurate modeling of both the classical point-to-point-based pipeline and our extensions to fan-out topologies.