Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Multicast in hypercube multiprocessors
Journal of Parallel and Distributed Computing
Introduction to algorithms
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Optimal broadcast and summation in the LogP model
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
An architecture for optimal all-to-all personalized communication
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Static and Run-Time Algorithms for All-to-Many Personalized Communication on Permutation Networks
IEEE Transactions on Parallel and Distributed Systems
Dynamic Broadcasting in Parallel Computing
IEEE Transactions on Parallel and Distributed Systems
Communication operations on coarse-grained mesh architectures
Parallel Computing
Maintaining Spatial Data Sets in Distributed-Memory Machines
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
All-to-All Communication on Meshes with Wormhole Routing
Proceedings of the 8th International Symposium on Parallel Processing
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
Proceedings of the 8th International Symposium on Parallel Processing
Many-to-many personalized communication with bounded traffic
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.01 |
In s-to-p broadcasting, s processors in a p-processor machine contain a message to be broadcast to all the processors, 1 驴s驴p. We present a number of different broadcasting algorithms that handle all ranges of s. We show how the performance of each algorithm is influenced by the distribution of the s source processors and by the relationships between the distribution and the characteristics of the interconnection network. For the Intel Paragon we show that for each algorithm and machine dimension there exist ideal distributions and distributions on which the performance degrades. For the Cray T3D we also demonstrate dependencies between distributions and machine sizes. To reduce the dependence of the performance on the distribution of sources, we propose a repositioning approach. In this approach, the initial distribution is turned into an ideal distribution of the target broadcasting algorithm. We report experimental results for the Intel Paragon and Cray T3D and discuss scalability and performance.