Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
High-performance multi-queue buffers for VLSI communications switches
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Deadlock-free multicast wormhole routing in multicomputer networks
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Unicast-Based Multicast Communication in Wormhole-Routed Networks
IEEE Transactions on Parallel and Distributed Systems
Meiko CS-2 interconnect Elan-Elite design
Parallel Computing - Special double issue: SUPRENUM and GENESIS
The SP2 high-performance switch
IBM Systems Journal
Pipelined memory shared buffer for VLSI switches
SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Optimal software multicast in wormhole-routed multistage networks
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers
IEEE Transactions on Parallel and Distributed Systems
A Reliable Hardware Barrier Synchronization Scheme
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
Proceedings of the 8th International Symposium on Parallel Processing
Performance Metrics and Measurement Techniques of Collective Communication Services
CANPC '97 Proceedings of the First International Workshop on Communication and Architectural Support for Network-Based Parallel Computing
Multi-address Encoding for Multicast
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Multicast on Irregular Switch-based Networks with Wormhole Routing
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Efficient Broadcast and Multicast on Multistage Interconnnection Networks using Multiport Encoding
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Optimal Broadcasting in Mesh-Connected Architectures
Optimal Broadcasting in Mesh-Connected Architectures
Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding
IEEE Transactions on Parallel and Distributed Systems
Multidestination Message Passing in Wormhole k-ary n-cube Networks with Base Routing Conformed Paths
IEEE Transactions on Parallel and Distributed Systems
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks
IEEE Transactions on Parallel and Distributed Systems
Asynchronous Tree-Based Multicasting in Wormhole-Switched MINs
IEEE Transactions on Parallel and Distributed Systems
A new switch chip for IBM RS/6000 SP systems
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Architectural Support for Efficient Multicasting in Irregular Networks
IEEE Transactions on Parallel and Distributed Systems
Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing
IEEE Transactions on Parallel and Distributed Systems
HIPIQS: A High-Performance Switch Architecture Using Input Queuing
IEEE Transactions on Parallel and Distributed Systems
HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Performance Enhancement Techniques for InfiniBand" Architecture
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
HIPIQS: A High-Performance Switch Architecture using Input Queuing
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Tree-Based Multicasting in Wormhole-Routed Irregular Topologies
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Exploring IBA Design Space for Improved Performance
IEEE Transactions on Parallel and Distributed Systems
Self-Aware software – will it become a reality?
Self-star Properties in Complex Information Systems
Hi-index | 0.00 |
Multidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this mechanism to switch-based parallel systems is non-trivial. In this paper we propose alternative switch architectures with differing buffer organizations to implement multidestination worms on switch-based parallel systems. First, we discuss issues related to such implementation (deadlock-freedom, replication mechanisms, header encoding, and routing). Next, we demonstrate how an existing central-buffer-based switch architecture supporting unicast message passing can be enhanced to accommodate multidestination message passing. Similarly, implementing multidestination worms on an input-buffer-based switch architecture is discussed. Both of these implementations are evaluated against each other as well as against a software-based scheme using the central buffer organization. Simulation experiments under a range of traffic (multiple multicast, bimodal, varying degree of multicast, and message length) and system size are used for evaluation. The study demonstrates the superiority of the central-buffer-based switch architecture. It also indicates that under bimodal traffic the central-buffer-based hardware multicast implementation affects background unicast traffic less adversely compared to a software-based multicast implementation. Thus, multidestination message passing can easily be applied to switch-based parallel systems to deliver good collective communication performance.