Optimal Broadcasting in Mesh-Connected Architectures

Authors:
Michael Barnett;David G. Payne;Robert A. van de Geijn
Affiliations:
-;-;-
Venue:
Optimal Broadcasting in Mesh-Connected Architectures
Year:
1991

Citing 0
Cited 11

Distributed memory matrix-vector multiplication and conjugate gradient algorithms

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A dominating set model for broadcast in all-port wormhole-routed 2D mesh networks

ICS '94 Proceedings of the 8th international conference on Supercomputing
Static and Run-Time Algorithms for All-to-Many Personalized Communication on Permutation Networks

IEEE Transactions on Parallel and Distributed Systems
Circuit-Switched Broadcasting in Torus Networks

IEEE Transactions on Parallel and Distributed Systems
Implementing multidestination worms in switch-based parallel systems: architectural alternatives and their impact

Proceedings of the 24th annual international symposium on Computer architecture
Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding

IEEE Transactions on Parallel and Distributed Systems
Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and Their Impact

IEEE Transactions on Parallel and Distributed Systems
Building a high-performance collective communication library

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Efficient communication algorithms for pipeline multicomputers

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Collective communication on architectures that support simultaneous communication over multiple links

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Faster topology-aware collective algorithms through non-minimal communication

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we disprove the common assumption that the time for broadcasting in a mesh is at best proportional to the square root of the number of processors, at least in the presence of worm-hole routing. We present an optimal algorithm for broadcasting in mesh-connected distributed-memory architectures with worm-hole routing. By organizing the processing nodes in a logical spanning tree, the algorithm executes in time proportional to the logarithm of the number of nodes without inducing contention in the communication network. We restrict the number of nodes in each dimension of the processor mesh to be a power of two. Our method provides insight into how to avoid and/or reduce network contention on meshes for other communication operations. Experimental results on the Intel Touchstone Delta system are included.