Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Efficient collective data distribution in all-port wormhole-routed hypercubes
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
Data Structures and Algorithms
Data Structures and Algorithms
Architecture and Implementation of Vulcan
Proceedings of the 8th International Symposium on Parallel Processing
NEC Corporation: NEC Cenju-3: A Microprocessor-Based Parallel Computer
Proceedings of the 8th International Symposium on Parallel Processing
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
Optimal All-to-All Personalized Exchange in Self-Routable Multistage Networks
IEEE Transactions on Parallel and Distributed Systems
Nonblocking k-Fold Multicast Networks
IEEE Transactions on Parallel and Distributed Systems
Nonblocking k-Fold Multicast Networks
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Pseudo-cycle-based multicast routing in wormhole-routed networks
Journal of Computer Science and Technology
An analytical model for the performance of buffered multicast banyan networks
Computer Communications
Hi-index | 0.00 |
Multistage interconnection networks are a popular class of interconnection architecture for constructing scalable parallel computers (SPCs). The focus of this paper is on the multistage network system which supports wormhole routed turnaround routing. Existing machines characterized by such a system model include the IBM SP-1 and SP-2, TMC CM-5, and Meiko CS-2.Efficient collective communication among processor nodes is critical to the performance of SPCs. A system-level multicast service, in which the same message is delivered from a source node to an arbitrary number of destination nodes, is fundamental in supporting collective communication primitives including the application-level broadcast, reduction, and barrier synchronization. This paper addresses how to efficiently implement multicast services in wormhole-routed multistage networks, in the absence of hardware multicast support, by exploiting the properties of the turnaround switching technology. An optimal multicast algorithm is proposed. The results of implementations on a 64-node SP-1 show that the proposed algorithm significantly outperforms the application-level broadcast primitives provided by currently existing collective communication libraries including the public domain MPI.