Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels
IEEE Transactions on Parallel and Distributed Systems
Concurrent Communication in High-Speed Wide Area Networks
IEEE Transactions on Parallel and Distributed Systems
L-Turn Routing: An Adaptive Routing in Irregular Networks
ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
Deadlock-Free Routing Based on Ordered Links
LCN '02 Proceedings of the 27th Annual IEEE Conference on Local Computer Networks
A routing underlay for overlay networks
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
End-to-end congestion control schemes: utility functions, random losses and ECN marks
IEEE/ACM Transactions on Networking (TON)
An Effective Methodology to Improve the Performance of the Up*/Down* Routing Algorithm
IEEE Transactions on Parallel and Distributed Systems
Scalable routing overlay networks
ACM SIGOPS Operating Systems Review
Locality-aware Connection Management and Rank Assignment forWide-area MPI
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Smartsockets: solving the connectivity problems in grid computing
Proceedings of the 16th international symposium on High performance distributed computing
A fast topology inference: a building block for network-aware parallel processing
Proceedings of the 16th international symposium on High performance distributed computing
ICKS '08 Proceedings of the International Conference on Informatics Education and Research for Knowledge-Circulating Society (icks 2008)
Throughput-competitive on-line routing
SFCS '93 Proceedings of the 1993 IEEE 34th Annual Foundations of Computer Science
Multi-domain grid/cloud computing through a hierarchical component-based middleware
Proceedings of the 8th International Workshop on Middleware for Grids, Clouds and e-Science
A component-based middleware for hybrid grid/cloud computing platforms
Concurrency and Computation: Practice & Experience
Hi-index | 0.01 |
Overlay networks as the communication medium in parallel and distributed applications have gained prominence, especially in Grid environments. However, providing both throughput performance and reliable communication on overlays have been given little attention. The core of this problem is that intermediate nodes have limited buffer memory, while the forwarding throughput must yield Gbps. Yet, implementing a naive flow control can deadlock the overlay. Thus, high performance flow control on overlays is a critical concern in heterogeneous wide-area networks, where input/output link throughput can vary significantly. We propose an overlay scheme that couples TCP connections and fixed intermediate buffer memory while adapting deadlock-free routing for our overlay routing in heterogeneous wide-area networks. Our scheme eliminates memory overflows at forwarding nodes by fixed buffer memory and deadlocks via a deadlock-free routing algorithm that resolves adaptation challenges for heterogeneous wide-area networks. Our overlay construction and routing optimizations account for underlying network latency and bandwidth information. Simulation on 13 clusters (515 nodes) and evaluation on 7 clusters (170 nodes) show that our deadlock-free routing poses negligible overhead in comparison to deadlock-unaware routing, and comparably with direct communication. We further demonstrate that for certain collective communications, our overlay even out-performs direct communication by mitigating or completely avoiding network contention. We show this on systems ranging from a single-switch cluster with 36 nodes to a Grid environment with 4 clusters and 291 nodes.