A bandwidth latency tradeoff for broadcast and reduction

Authors:
Peter Sanders;Jop F. Sibeyn
Affiliations:
Max-Planck-Institut für Informatik, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany;Department of Computing Science, Umeå University, 901 87 Umeå Sweden
Venue:
Information Processing Letters
Year:
2003

Citing 5
Cited 9

Optimum Broadcasting and Personalized Communication in Hypercubes

IEEE Transactions on Computers
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Optimal and near-optimal algorithms for k-item broadcast

Journal of Parallel and Distributed Computing
MPI: The Complete Reference

MPI: The Complete Reference

Self-adapting numerical software (SANS) effort

IBM Journal of Research and Development
Techniques for pipelined broadcast on ethernet switched clusters

Journal of Parallel and Distributed Computing
Optimal broadcast for fully connected processor-node networks

Journal of Parallel and Distributed Computing
A study of process arrival patterns for MPI collective operations

International Journal of Parallel Programming
Two-tree algorithms for full bandwidth broadcast, reduction and scan

Parallel Computing
Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Parallel prefix (scan) algorithms for MPI

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
An optimal broadcast algorithm adapted to SMP clusters

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Optimal broadcast for fully connected networks

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.89

Visualization

Abstract

The "fractional tree" algorithm for broadcasting and reduction is introduced. Its communication pattern interpolates between two well known patterns-sequential pipeline and pipelined binary tree. The speedup over the best of these simple methods can approach two for large systems and messages of intermediate size. For networks which are not very densely connected the new algorithm seems to be the best known method for the important case that each processor has only a single (possibly bidirectional) channel into the communication network.