Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Efficient collective data distribution in all-port wormhole-routed hypercubes
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
ComPaSS: a communication package for scalable software design
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
Architecture and Implementation of Vulcan
Proceedings of the 8th International Symposium on Parallel Processing
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Performance Evaluation of Switch-Based Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 24th annual international symposium on Computer architecture
Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding
IEEE Transactions on Parallel and Distributed Systems
Asynchronous Tree-Based Multicasting in Wormhole-Switched MINs
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Architectural Support for Efficient Multicasting in Irregular Networks
IEEE Transactions on Parallel and Distributed Systems
Sufficient Conditions for Optimal Multicast Communication
ICPP '97 Proceedings of the international Conference on Parallel Processing
A Class of Interconnection Networks for Multicasting
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Reliable Hardware Barrier Synchronization Scheme
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Performance Analysis of Multistage Interconnection Networks using a Multicast Algorithm
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Communication modeling of multicast in all-port wormhole-routed NoCs
Journal of Systems and Software
An analytical model of broadcast in QoS-aware wormhole-routed NoCs
Journal of Systems and Software
Hi-index | 0.00 |
Multistage interconnection networks are a popular class of interconnection architecture for constructing scalable parallel computers (SPCs). The focus of this paper is on wormhole routed multistage networks supporting turnaround routing. Existing machines characterized by such a system model include the IBM SP-1, TMC CM-5, and Meiko CS-2.Efficient collective communication among processor nodes is critical to the performance of SPCs. A system-level multicast service, in which the same message is delivered from a source node to an arbitrary number of destination nodes, is fundamental in supporting collective communication primitives including the application-level broadcast, reduction, and barrier synchronization. This paper addresses how to efficiently implement multicast services in wormhole-routed multistage networks, in the absence of hardware multicast support, by exploiting the properties of the switching technology. An optimal multicast algorithm is proposed. The results of implementations on a 64-node SP-1 show that the proposed algorithm significantly outperforms the application-level broadcast primitives provided by currently existing collective communication libraries including the public domain MPI.