Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach

Authors:
Yu-Chee Tseng;Ting-Hsien Lin;Dhabaleswar K. Panda;Sandeep K. S. Gupta
Affiliations:
National Central Univ., Chung-Li, Taiwan;Ohio State Univ., Columbus;Ohio State Univ., Columbus;Colorado State Univ., Ft. Collins
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1997

Citing 11
Cited 18

A processor architecture for horizon

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Warp: an integrated solution of high-speed parallel computing

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Polymorphic-Torus Network

IEEE Transactions on Computers
Performance characteristics of the Connection Machine hypertree network

Journal of Parallel and Distributed Computing - Special issue on performance of supercomputers
An architecture for optimal all-to-all personalized communication

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Horizons of parallel computation

Journal of Parallel and Distributed Computing
All-to-All Personalized Communication in a Wormhole-Routed Torus

IEEE Transactions on Parallel and Distributed Systems
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering

Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
A Survey of Wormhole Routing Techniques in Direct Networks

Computer
An efficient scheme for complete exchange in 2D tori

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
MPI: A Message-Passing Interface

MPI: A Message-Passing Interface

Gossiping on Meshes and Tori

IEEE Transactions on Parallel and Distributed Systems
Efficient Broadcasting in Wormhole-Routed Multicomputers: A Network-Partitioning Approach

IEEE Transactions on Parallel and Distributed Systems
Toward Optimal Complete Exchange on Wormhole-Routed Tori

IEEE Transactions on Computers
Lower Bounds on Communication Loads and Optimal Placements in Torus Networks

IEEE Transactions on Computers
Optimal All-to-All Personalized Exchange in Self-Routable Multistage Networks

IEEE Transactions on Parallel and Distributed Systems
Scatter and gather operations on an asynchronous communication model

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Optimal All-to-All Personalized Exchange in a Class of Optical Multistage Networks

IEEE Transactions on Parallel and Distributed Systems
Pipelined All-to-All Broadcast in All-Port Meshes and Tori

IEEE Transactions on Computers
Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori

IEEE Transactions on Parallel and Distributed Systems
Fast Gossiping in Square Meshes/Tori with Bounded-Size Packets

IEEE Transactions on Parallel and Distributed Systems
Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
All-to-all personalized communication on multistage interconnection networks

Discrete Applied Mathematics
Lower Bounds on Communication Loads and Optimal Placements in Torus Networks

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Exchanging messages of different sizes

Journal of Parallel and Distributed Computing
Optimal all-to-all personalised exchange in a novel optical multistage interconnection network

International Journal of High Performance Computing and Networking
Performance analysis of user-level PIM communication in the data intensive architecture (DIVA) system

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Scheduling independent jobs for torus connected networks with/without link contention

Mathematical and Computer Modelling: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. Several complete exchange algorithms have been proposed in the literature for wormhole meshes. However, these algorithms, when applied to tori, cannot take advantage of wrap-around interconnections to implement complete exchange with reduced latency. In this paper, a new diagonal-propagation approach is proposed to develop a set of complete exchange algorithms for 2D and 3D tori. This approach exploits the symmetric interconnections of tori and allows to develop a communication schedule consisting of several contention-free phases. These algorithms are indirect in nature and they use message combining to reduce the number of phases (message start-ups). It is shown that these algorithms effectively use the bisection bandwidth of a torus which is twice that for an equal sized mesh, to achieve complete exchange in time which is almost half of the best known complete exchange time on an equal sized mesh. The effectiveness of these algorithms is verified through simulation studies for varying system and technological parameters. It is also demonstrated that synchronous implementations of these algorithms (by introducing barriers between phases) lead to reduced latency for complete exchange with large messages, while the asynchronous ones are better for smaller messages.