Performance Analysis of MPI Collective Operations

Authors:
Jelena Pjesivac-Grbovic;Thara Angskun;George Bosilca;Graham E. Fagg;Edgar Gabriel;Jack J. Dongarra
Affiliations:
University of Tennessee Computer Science Department, Knoxville;University of Tennessee Computer Science Department, Knoxville;University of Tennessee Computer Science Department, Knoxville;University of Tennessee Computer Science Department, Knoxville;University of Tennessee Computer Science Department, Knoxville;University of Tennessee Computer Science Department, Knoxville
Venue:
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Year:
2005

Citing 14
Cited 28

LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The communication challenge for MPP: Intel Paragon and Meiko CS-2

Parallel Computing
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Network performance-aware collective communication for clustered wide-area systems

Parallel Computing - Clusters and computational grids for scientific computing
Assessing Fast Network Interfaces

IEEE Micro
Fast Measurement of LogP Parameters for Message Passing Platforms

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Reproducible Measurements of MPI Performance Characteristics

Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
An Evaluation of Current High-Performance Networks

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Efficient implementation of reduce-scatter in MPI

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
On optimizing collective communication

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing

Self-adapting numerical software (SANS) effort

IBM Journal of Research and Development
A study of process arrival patterns for MPI collective operations

Proceedings of the 21st annual international conference on Supercomputing
MPI collective algorithm selection and quadtree encoding

Parallel Computing
Techniques for pipelined broadcast on ethernet switched clusters

Journal of Parallel and Distributed Computing
Optimal broadcast for fully connected processor-node networks

Journal of Parallel and Distributed Computing
A framework for adaptive collective communications for heterogeneous hierarchical computing systems

Journal of Computer and System Sciences
Efficient shared memory and RDMA based collectives on multi-rail QsNetII SMP clusters

Cluster Computing
Adaptive approaches for efficient parallel algorithms on cluster-based systems

International Journal of Grid and Utility Computing
Efficient high performance collective communication for the cell blade

Proceedings of the 23rd international conference on Supercomputing
A study of process arrival patterns for MPI collective operations

International Journal of Parallel Programming
Modeling advanced collective communication algorithms on cell-based systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Assessing contention effects on MPI_alltoall communications

GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Toward performance models of MPI implementations for understanding application scaling issues

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Efficient RDMA-based multi-port collectives on multi-rail QsNetII clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A decomposition approach for optimizing the performance of MPI libraries

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Collective operations in NEC's high-performance MPI libraries

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Pipelined broadcast on ethernet switched clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Collective communication costs analysis over gigabit ethernet and infiniband

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
A case for non-blocking collective operations

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
MPI collective algorithm selection and quadtree encoding

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
ScoPred–scalable user-directed performance prediction using complexity modeling and historical data

JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Performance analysis and optimization of MPI collective operations on multi-core clusters

The Journal of Supercomputing
Fat-tree routing and node ordering providing contention free traffic for MPI global collectives

Journal of Parallel and Distributed Computing
Runtime detection and optimization of collective communication patterns

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Decision trees and MPI collective algorithm selection problem

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Full bandwidth broadcast, reduction and scan with only two trees

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Optimization of collective communications in HeteroMPI

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Energy estimation for MPI broadcasting algorithms in large scale HPC systems

Proceedings of the 20th European MPI Users' Group Meeting

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous studies of application usage show that the performance of collective communications are critical for high-performance computing and are often overlooked when compared to the point-to-point performance. In this paper, we analyze and attempt to improve intra-cluster collective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP. The predictions from the models were compared to the experimentally gathered data and our findings were used to optimize the implementation of collective operations in the FT-MPI library.