High-performance multi-queue buffers for VLSI communications switches
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
HIPIQS: A High-Performance Switch Architecture Using Input Queuing
IEEE Transactions on Parallel and Distributed Systems
Tiny Tera: A Packet Switch Core
IEEE Micro
Performance Evaluation of the Quadrics Interconnection Network
Cluster Computing
Adaptive Source Routing in Multistage Interconnection Networks
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Reliable Hardware Barrier Synchronization Scheme
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Congestion-Free Routing on the CM-5 Data Router
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
NAMD: biomolecular simulation on thousands of processors
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Framework for Collective Personalized Communication
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
K-ary N-trees: High Performance Networks for Massively Parallel Architectures
K-ary N-trees: High Performance Networks for Massively Parallel Architectures
Scaling All-to-All Multicast on Fat-tree Networks
ICPADS '04 Proceedings of the Parallel and Distributed Systems, Tenth International Conference
POSE: Getting Over Grainsize in Parallel Discrete Event Simulation
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Scalable NIC-based Reduction on Large-scale Clusters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Multicast scheduling for input-queued switches
IEEE Journal on Selected Areas in Communications
Hi-index | 0.00 |
We present an output-queued switch architecture with cross-point buffering that has improved performance for both point-to-point communication and hardware accelerated collective communication. In the past, output queuing architectures have been less popular as they require more internal speedup and buffering. However, with current technology it is possible to build output-queued switches with a relatively large number of ports. We demonstrate that our output-queued architecture performs well for point-to-point messages, specially in a fat-tree topology. We also show that output-queued architectures facilitate efficient implementations of multicasts and reductions. We present performance of multicasts and reductions on individual switches and a network of switches interconnected in a fat-tree topology. We also present simulation results based on synthetic workloads that emulate a molecular dynamics application.