Randomized self-routing algorithms for Clos networks
Computers and Electrical Engineering - Special issue: Parallel and distributed computing for intelligent systems
Direct bulk-synchronous parallel algorithms
Journal of Parallel and Distributed Computing
Blocking and nonblocking multirate Clos switching networks
IEEE/ACM Transactions on Networking (TON)
On Multirate Rearrangeable Clos Networks
SIAM Journal on Computing
Stability of adaptive and non-adaptive packet routing policies in adversarial queueing networks
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Supporting Fully Adaptive Routing in InfiniBand Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Enforcing in-order packet delivery in system area networks with adaptive routing
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
The BlackWidow High-Radix Clos Network
Proceedings of the 33rd annual international symposium on Computer Architecture
Adaptive routing in high-radix clos network
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Indirect adaptive routing on large scale interconnection networks
Proceedings of the 36th annual international symposium on Computer architecture
Understanding TCP incast throughput collapse in datacenter networks
Proceedings of the 1st ACM workshop on Research on enterprise networking
Sorting Reordered Packets with Interrupt Coalescing
Computer Networks: The International Journal of Computer and Telecommunications Networking
Adaptive Routing in Data Center Bridges
HOTI '09 Proceedings of the 2009 17th IEEE Symposium on High Performance Interconnects
Blue Gene/L torus interconnection network
IBM Journal of Research and Development
Hedera: dynamic flow scheduling for data center networks
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
R3C2: Reactive Route and Rate Control for CEE
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Hadoop: The Definitive Guide
On the Relation between Congestion Control, Switch Arbitration and Fairness
CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Managing data transfers in computer clusters with orchestra
Proceedings of the ACM SIGCOMM 2011 conference
On Nonblocking Folded-Clos Networks in Computer Communication Environments
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Nonblocking, repackable, and rearrangeable Clos networks: fifty years of the theory evolution
IEEE Communications Magazine
Fat-tree routing and node ordering providing contention free traffic for MPI global collectives
Journal of Parallel and Distributed Computing
A case for standard non-blocking collective operations
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.00 |
With the growing popularity of big-data applications, Data Center Networks increasingly carry larger and longer traffic flows. As a result of this increased flow granularity, static routing cannot efficiently load-balance traffic, resulting in an increased network contention and a reduced throughput. Unfortunately, while adaptive routing can solve this load-balancing problem, network designers refrain from using it, because it also creates out-of-order packet delivery that can significantly degrade the reliable transport performance of the longer flows. In this paper, we show that by throttling each flow bandwidth to half of the network link capacity, a distributed-adaptive-routing algorithm is able to converge to a non-blocking routing assignment within a few iterations, causing minimal out-of-order packet delivery. We present a Markov chain model for distributed-adaptive-routing in the context of Clos networks that provides an approximation for the expected convergence time. This model predicts that for full-link-bandwidth traffic, the convergence time is exponential with the network size, so out-of-order packet delivery is unavoidable for long messages. However, with half-rate traffic, the algorithm converges within a few iterations and exhibits weak dependency on the network size. Therefore, we show that distributed-adaptive-routing may be used to provide a scalable and non-blocking routing even for long flows on a rearrangeably-non-blocking Clos network under half-rate conditions. The proposed model is evaluated and approximately fits the abstract system simulation model. Hardware implementation guidelines are provided and evaluated using a detailed flit-level InfiniBand simulation model. These results directly apply to adaptive-routing systems designed and deployed in various fields.