Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
High-performance multi-queue buffers for VLSI communications switches
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Deadlock-free multicast wormhole routing in multicomputer networks
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Unicast-Based Multicast Communication in Wormhole-Routed Networks
IEEE Transactions on Parallel and Distributed Systems
Meiko CS-2 interconnect Elan-Elite design
Parallel Computing - Special double issue: SUPRENUM and GENESIS
The SP2 high-performance switch
IBM Systems Journal
Pipelined memory shared buffer for VLSI switches
SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Distributed shared memory systems with improved barrier synchronization and data transfer
ICS '97 Proceedings of the 11th international conference on Supercomputing
Proceedings of the 24th annual international symposium on Computer architecture
Simulation of modern parallel systems: a CSIM-based approach
Proceedings of the 29th conference on Winter simulation
Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding
IEEE Transactions on Parallel and Distributed Systems
Multidestination Message Passing in Wormhole k-ary n-cube Networks with Base Routing Conformed Paths
IEEE Transactions on Parallel and Distributed Systems
Asynchronous Tree-Based Multicasting in Wormhole-Switched MINs
IEEE Transactions on Parallel and Distributed Systems
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
Optimal software multicast in wormhole-routed multistage networks
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers
IEEE Transactions on Parallel and Distributed Systems
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
A Reliable Hardware Barrier Synchronization Scheme
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
Proceedings of the 8th International Symposium on Parallel Processing
Performance Metrics and Measurement Techniques of Collective Communication Services
CANPC '97 Proceedings of the First International Workshop on Communication and Architectural Support for Network-Based Parallel Computing
Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages
CANPC '00 Proceedings of the 4th International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Multi-address Encoding for Multicast
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Multicast on Irregular Switch-based Networks with Wormhole Routing
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Efficient Broadcast and Multicast on Multistage Interconnnection Networks using Multiport Encoding
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
HIPIQS: A High-Performance Switch Architecture using Input Queuing
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Optimal Broadcasting in Mesh-Connected Architectures
Optimal Broadcasting in Mesh-Connected Architectures
Architectural support for efficient communication in scalable parallel systems
Architectural support for efficient communication in scalable parallel systems
Designing efficient communication subsystems for distributed shared memory (dsm) systems
Designing efficient communication subsystems for distributed shared memory (dsm) systems
Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing
IEEE Transactions on Parallel and Distributed Systems
Dual-tree-based multicasting on wormhole-routed irregular switch-based networks
Journal of Systems Architecture: the EUROMICRO Journal
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Hi-index | 0.00 |
Multidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this mechanism to switch-based parallel systems is nontrivial. In this paper, we propose alternative switch architectures with differing buffer organizations to implement multidestination worms on switch-based parallel systems. First, we discuss issues related to such implementation (deadlock-freedom, replication mechanisms, header encoding, and routing). Next, we demonstrate how an existing central-buffer-based switch architecture supporting unicast message passing can be enhanced to accommodate multidestination message passing. Similarly, implementing multidestination worms on an input-buffer-based switch architecture is discussed, and two architectural alternatives are presented that reduce the wiring complexity in a practical switch implementation. The central-buffer-based and input-buffer-based implementations are evaluated against each other, as well as against the corresponding software-based schemes. Simulation experiments under a range of traffic (multiple multicast, bimodal, varying degree of multicast, and message length) and system size are used for evaluation. The study demonstrates the superiority of the central-buffer-based switch architecture. It also indicates that under bimodal traffic the central-buffer-based hardware multicast implementation affects background unicast traffic less adversely compared to a software-based multicast implementation. These results show that multidestination message passing can be applied easily and effectively to switch-based parallel systems to deliver good multicast and collective communication performance.