A processor architecture for horizon
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Warp: an integrated solution of high-speed parallel computing
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
IEEE Transactions on Computers
Performance characteristics of the Connection Machine hypertree network
Journal of Parallel and Distributed Computing - Special issue on performance of supercomputers
An architecture for optimal all-to-all personalized communication
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Horizons of parallel computation
Journal of Parallel and Distributed Computing
All-to-All Personalized Communication in a Wormhole-Routed Torus
IEEE Transactions on Parallel and Distributed Systems
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
An efficient scheme for complete exchange in 2D tori
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
MPI: A Message-Passing Interface
MPI: A Message-Passing Interface
IEEE Transactions on Parallel and Distributed Systems
Efficient Broadcasting in Wormhole-Routed Multicomputers: A Network-Partitioning Approach
IEEE Transactions on Parallel and Distributed Systems
Toward Optimal Complete Exchange on Wormhole-Routed Tori
IEEE Transactions on Computers
Lower Bounds on Communication Loads and Optimal Placements in Torus Networks
IEEE Transactions on Computers
Optimal All-to-All Personalized Exchange in Self-Routable Multistage Networks
IEEE Transactions on Parallel and Distributed Systems
Scatter and gather operations on an asynchronous communication model
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Optimal All-to-All Personalized Exchange in a Class of Optical Multistage Networks
IEEE Transactions on Parallel and Distributed Systems
Pipelined All-to-All Broadcast in All-Port Meshes and Tori
IEEE Transactions on Computers
Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori
IEEE Transactions on Parallel and Distributed Systems
Fast Gossiping in Square Meshes/Tori with Bounded-Size Packets
IEEE Transactions on Parallel and Distributed Systems
Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
All-to-all personalized communication on multistage interconnection networks
Discrete Applied Mathematics
Lower Bounds on Communication Loads and Optimal Placements in Torus Networks
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Exchanging messages of different sizes
Journal of Parallel and Distributed Computing
Optimal all-to-all personalised exchange in a novel optical multistage interconnection network
International Journal of High Performance Computing and Networking
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Scheduling independent jobs for torus connected networks with/without link contention
Mathematical and Computer Modelling: An International Journal
Hi-index | 0.01 |
All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. Several complete exchange algorithms have been proposed in the literature for wormhole meshes. However, these algorithms, when applied to tori, cannot take advantage of wrap-around interconnections to implement complete exchange with reduced latency. In this paper, a new diagonal-propagation approach is proposed to develop a set of complete exchange algorithms for 2D and 3D tori. This approach exploits the symmetric interconnections of tori and allows to develop a communication schedule consisting of several contention-free phases. These algorithms are indirect in nature and they use message combining to reduce the number of phases (message start-ups). It is shown that these algorithms effectively use the bisection bandwidth of a torus which is twice that for an equal sized mesh, to achieve complete exchange in time which is almost half of the best known complete exchange time on an equal sized mesh. The effectiveness of these algorithms is verified through simulation studies for varying system and technological parameters. It is also demonstrated that synchronous implementations of these algorithms (by introducing barriers between phases) lead to reduced latency for complete exchange with large messages, while the asynchronous ones are better for smaller messages.