A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Microarchitecture of a High-Radix Router
Proceedings of the 32nd annual international symposium on Computer Architecture
Control Path Implementation for a Low-Latency Optical HPC Switch
HOTI '05 Proceedings of the 13th Symposium on High Performance Interconnects
Rate-based Flow-control for the CICQ Switch
LCN '05 Proceedings of the The IEEE Conference on Local Computer Networks 30th Anniversary
A framework for end-to-end simulation of high-performance computing systems
Proceedings of the 1st international conference on Simulation tools and techniques for communications, networks and systems & workshops
On credibility of simulation studies of telecommunication networks
IEEE Communications Magazine
Reliable and efficient hop-by-hop flow control
IEEE Journal on Selected Areas in Communications
Hi-index | 0.00 |
High-radix switches are desirable building blocks for large computer interconnection networks, because they are more suitable to convert chip I/O bandwidth into low latency and low cost than low-radix switches [J. Kim, W.J. Dally, B. Towles, A.K. Gupta, Microarchitecture of a high-radix router, in: Proc. ISCA 2005, Madison, WI, 2005]. Unfortunately, most existing switch architectures do not scale well to a large number of ports, for example, the complexity of the buffered crossbar architecture scales quadratically with the number of ports. Compounded with support for long round-trip times and many virtual channels, the overall buffer requirements limit the feasibility of such switches to modest port counts. Compromising on the buffer sizing leads to a drastic increase in latency and reduction in throughput, as long as traditional credit flow control is employed at the link level. We propose a novel link-level flow control protocol that enables high-performance scalable switches that are based on the increasingly popular buffered crossbar architecture, to scale to higher port counts without sacrificing performance. By combining credited and speculative transmission, this scheme achieves reliable delivery, low latency, and high throughput, even with crosspoint buffers that are significantly smaller than the round-trip time. The proposed scheme substantially reduces message latency and improves throughput of partially buffered crossbar switches loaded with synthetic uniform and non-uniform bursty traffic. Moreover, simulations replaying traces of several typical MPI applications demonstrate communication speedup factors of 2 to 10 times.