SIAM Journal on Computing
Adaptive deadlock- and livelock-free routing with all minimal paths in Torus networks
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Parallel Information Dissemination by Packets
SIAM Journal on Computing
Unicast-Based Multicast Communication in Wormhole-Routed Networks
IEEE Transactions on Parallel and Distributed Systems
All-to-All Personalized Communication in a Wormhole-Routed Torus
IEEE Transactions on Parallel and Distributed Systems
Hybrid algorithms for complete exchange in 2D meshes
ICS '96 Proceedings of the 10th international conference on Supercomputing
A Broadcast Algorithm for All-Port Wormhole-Routed Torus Networks
IEEE Transactions on Parallel and Distributed Systems
Fast Gossiping on Mesh-Bus Computers
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
A Theory for Total Exchange in Multidimensional Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
Efficient Broadcasting in Wormhole-Routed Multicomputers: A Network-Partitioning Approach
IEEE Transactions on Parallel and Distributed Systems
Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels
IEEE Transactions on Parallel and Distributed Systems
Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Optimal Multicast Communication in Wormhole-Routed Torus Networks
IEEE Transactions on Parallel and Distributed Systems
All-To-All Communication with Minimum Start-Up Costs in 2D/3D Tori and Meshes
IEEE Transactions on Parallel and Distributed Systems
Throttle and Preempt: A New Flow Control for Real-Time Communications in Wormhole Networks
ICPP '97 Proceedings of the international Conference on Parallel Processing
Algorithms for All-to-All Personalized Exchange in 2D and 3D Tori
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Multi-phase array redistribution: modeling and evaluation
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Complete Exchange on a Wormhole Routed Mesh
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Toward Optimal Complete Exchange on Wormhole-Routed Tori
ICPADS '97 Proceedings of the 1997 International Conference on Parallel and Distributed Systems
Efficient multicast in wormhole-routed 2D Mesh/Torus Multicomputers: a network-partitioning approach
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
An Architecture for Optimal All-to-All Personalized Communication
An Architecture for Optimal All-to-All Personalized Communication
Experimental Validation of Parallel Computation Models on the Intel Paragon
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Document for a Standard Message-Passing Interface
Document for a Standard Message-Passing Interface
Circuit-Switched Broadcasting in Multi-Port Multi-Dimensional Torus Networks
The Journal of Supercomputing
Hi-index | 14.98 |
In this paper, we propose new routing schemes to perform all-to-all personalized communication (or known as complete exchange) in wormhole-routed, one-port tori. On tori of equal size along each dimension, our algorithms use both asymptotically optimal startup and transmission time. The results are characterized by several interesting features: 1) the use of gather-scatter tree to achieve optimality in startup time, 2) enforcement of shortest paths in routing messages to achieve optimality in transmission time, 3) application of network-partitioning techniques to reduce the constant associated with the transmission time, and 4) the dimension-by-dimension and gather-scatter-tree approach to make possible applying the results to nonsquare, any-size tori. In the literature, some algorithms are optimal in only one of startup and transmission costs, while some, although asymptotically optimal in both costs, will incur much larger constants associated with the costs. Numerical analysis and experiment both show that significant improvement can be obtained by our scheme on total communication latency over existing results.