Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Optimal and near-optimal algorithms for k-item broadcast
Journal of Parallel and Distributed Computing
MPI: The Complete Reference
Self-adapting numerical software (SANS) effort
IBM Journal of Research and Development
Techniques for pipelined broadcast on ethernet switched clusters
Journal of Parallel and Distributed Computing
Optimal broadcast for fully connected processor-node networks
Journal of Parallel and Distributed Computing
A study of process arrival patterns for MPI collective operations
International Journal of Parallel Programming
Two-tree algorithms for full bandwidth broadcast, reduction and scan
Parallel Computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Parallel prefix (scan) algorithms for MPI
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
An optimal broadcast algorithm adapted to SMP clusters
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Optimal broadcast for fully connected networks
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Hi-index | 0.89 |
The "fractional tree" algorithm for broadcasting and reduction is introduced. Its communication pattern interpolates between two well known patterns-sequential pipeline and pipelined binary tree. The speedup over the best of these simple methods can approach two for large systems and messages of intermediate size. For networks which are not very densely connected the new algorithm seems to be the best known method for the important case that each processor has only a single (possibly bidirectional) channel into the communication network.