Collective communication on architectures that support simultaneous communication over multiple links

Authors:
Ernie Chan;Robert van de Geijn;William Gropp;Rajeev Thakur
Affiliations:
The University of Texas at Austin;The University of Texas at Austin;Argonne National Laboratory;Argonne National Laboratory
Venue:
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2006

Citing 21
Cited 9

Solving problems on concurrent processors. Vol. 1: General techniques and regular problems

Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Data communication in hypercubes

Journal of Parallel and Distributed Computing
Optimum Broadcasting and Personalized Communication in Hypercubes

IEEE Transactions on Computers
On global combine operations

Journal of Parallel and Distributed Computing
Global combine algorithms for 2-D meshes with wormhole routing

Journal of Parallel and Distributed Computing
Broadcasting on meshes with wormhole routing

Journal of Parallel and Distributed Computing
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
On Benchmarking Collective MPI Operations

Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Experimental Results about MPI Collective Communication Operations

HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
A Framework for Collective Personalized Communication

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Optimal Broadcasting in Mesh-Connected Architectures

Optimal Broadcasting in Mesh-Connected Architectures
Optimizing the Synchronization Operations in Message Passing Interface One-Sided Communication

International Journal of High Performance Computing Applications
On optimizing collective communication

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Overview of the Blue Gene/L system architecture

IBM Journal of Research and Development
Design and implementation of message-passing services for the Blue Gene/L supercomputer

IBM Journal of Research and Development
An optimal broadcast algorithm adapted to SMP clusters

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Optimal broadcast for fully connected networks

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications

An efficient MPI_allgather for grids

Proceedings of the 16th international symposium on High performance distributed computing
One-to-all personalized communication in torus networks

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
The design and implementation of MPI collective operations for clusters in long-and-fast networks

Cluster Computing
Anatomy of a cortical simulator

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Advanced collective communication in aspen

Proceedings of the 22nd annual international conference on Supercomputing
Optimal broadcast for fully connected processor-node networks

Journal of Parallel and Distributed Computing
Efficient shared memory and RDMA based collectives on multi-rail QsNetII SMP clusters

Cluster Computing
A configurable algorithm for parallel image-compositing applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimal bucket algorithms for large MPI collectives on torus interconnects

Proceedings of the 24th ACM International Conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional collective communication algorithms are designed with the assumption that a node can communicate with only one other node at a time. On new parallel architectures such as the IBM Blue Gene/L, a node can communicate with multiple nodes simultaneously. We have redesigned and reimplemented many of the MPI collective communication algorithms to take advantage of this ability to send simultaneously, including broadcast, reduce(-to-one), scatter, gather, allgather, reduce-scatter, and allreduce. We show that these new algorithms have lower expected costs than the previously known lower bounds based on old models of parallel computation. Results are included comparing their performance to the default implementations in IBM's MPI.