Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Analysis of the increase and decrease algorithms for congestion avoidance in computer networks
Computer Networks and ISDN Systems
Time scale analysis scalability issues for explicit rate allocation in ATM networks
IEEE/ACM Transactions on Networking (TON)
Congestion control and traffic management in ATM networks: recent advances and a survey
Computer Networks and ISDN Systems
Phantom: a simple and effective flow control scheme
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
The ERICA switch algorithm for ABR traffic management in ATM networks
IEEE/ACM Transactions on Networking (TON)
Congestion control for high bandwidth-delay product networks
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Layered Shortest Path (LASH) Routing in Irregular System Area Networks
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
k -ary n -trees: High Performance Networks for Massively Parallel Architectures
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
An efficient rate allocation algorithm for ATM networks providing max-min fairness
Proceedings of the IFIP Sixth International Conference on High Performance Networking VI
Architectural Support for Reducing Communication Overhead in Multiprocessor Interconnection Networks
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Supporting Fully Adaptive Routing in InfiniBand Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
General Methodology for Designing Efficient Traffic Scheduling and Shaping Algorithms
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Congestion Control in InfiniBand Networks
HOTI '05 Proceedings of the 13th Symposium on High Performance Interconnects
Why flow-completion time is the right metric for congestion control
ACM SIGCOMM Computer Communication Review
AINA '06 Proceedings of the 20th International Conference on Advanced Information Networking and Applications - Volume 01
Adaptive routing in high-radix clos network
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
On the complexity of time table and multi-commodity flow problems
SFCS '75 Proceedings of the 16th Annual Symposium on Foundations of Computer Science
Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and Tori
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
TCP Vegas: end to end congestion avoidance on a global Internet
IEEE Journal on Selected Areas in Communications
A switch-based approach to throughput collapse and starvation in data centers
Computer Networks: The International Journal of Computer and Telecommunications Networking
Hi-index | 0.00 |
Congestion arises in cluster-based supercomputers due to contention for links, spreads due to oversubscription of communication resources, and reduces performance. We mitigate it using efficient, scalable adaptive routing and explicit rate calculation. We use virtual circuits for in-order packet delivery; path setup is performed by switches locally with no blocking or backtracking. For random permutations in a slightly enriched fat-tree topology, maximum contention is reduced by up to 50% relative to static routing, but only rate control can translate this into actual gain. Unfortunately, TCP's window-based rate control fails because of the low bandwidth-delay product, and small buffers moreover cause congestion spreading even with a single-packet window. InfiniBand's CCA employs multiple parameters, which must apparently be tuned per topology and traffic pattern. Focusing on phase-based applications, we present a distributed explicit rate-assignment algorithm for completion-time minimization of the communication phase (min-max flow completion). Also, a generally applicable packet-injection scheme for a source with different-rate flows that realizes desired rates even with very small switch buffers. Simulations show that adaptive routing alone is ineffective, rate control's effectiveness is limited, yet together they shorten the communication phase by tens of percents. Finally, our explicit rate-calculation algorithm is faster than current reactive schemes.