Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Data communication in hypercubes
Journal of Parallel and Distributed Computing
Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Journal of Parallel and Distributed Computing
Global combine algorithms for 2-D meshes with wormhole routing
Journal of Parallel and Distributed Computing
Broadcasting on meshes with wormhole routing
Journal of Parallel and Distributed Computing
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
On Benchmarking Collective MPI Operations
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Experimental Results about MPI Collective Communication Operations
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
A Framework for Collective Personalized Communication
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Optimal Broadcasting in Mesh-Connected Architectures
Optimal Broadcasting in Mesh-Connected Architectures
Optimizing the Synchronization Operations in Message Passing Interface One-Sided Communication
International Journal of High Performance Computing Applications
On optimizing collective communication
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
An optimal broadcast algorithm adapted to SMP clusters
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Optimal broadcast for fully connected networks
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
An efficient MPI_allgather for grids
Proceedings of the 16th international symposium on High performance distributed computing
One-to-all personalized communication in torus networks
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Anatomy of a cortical simulator
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Advanced collective communication in aspen
Proceedings of the 22nd annual international conference on Supercomputing
Optimal broadcast for fully connected processor-node networks
Journal of Parallel and Distributed Computing
A configurable algorithm for parallel image-compositing applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimal bucket algorithms for large MPI collectives on torus interconnects
Proceedings of the 24th ACM International Conference on Supercomputing
Hi-index | 0.00 |
Traditional collective communication algorithms are designed with the assumption that a node can communicate with only one other node at a time. On new parallel architectures such as the IBM Blue Gene/L, a node can communicate with multiple nodes simultaneously. We have redesigned and reimplemented many of the MPI collective communication algorithms to take advantage of this ability to send simultaneously, including broadcast, reduce(-to-one), scatter, gather, allgather, reduce-scatter, and allreduce. We show that these new algorithms have lower expected costs than the previously known lower bounds based on old models of parallel computation. Results are included comparing their performance to the default implementations in IBM's MPI.