Stream control transmission protocol (SCTP): a reference guide
Stream control transmission protocol (SCTP): a reference guide
Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
End-to-end fault tolerance using transport layer multihoming
End-to-end fault tolerance using transport layer multihoming
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
High-performance message striping over reliable transport protocols
The Journal of Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
End-to-end concurrent multipath transfer using transport layer multihoming
End-to-end concurrent multipath transfer using transport layer multihoming
Concurrent multipath transfer using SCTP multihoming over independent end-to-end paths
IEEE/ACM Transactions on Networking (TON)
Concurrent multipath transfer using transport layer multihoming: performance under network failures
MILCOM'06 Proceedings of the 2006 IEEE conference on Military communications
Employing transport layer multi-railing in cluster networks
Journal of Parallel and Distributed Computing
Implementation and evaluation of concurrent multipath transfer for SCTP in the INET framework
Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques
International Journal of Networking and Virtual Organisations
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
High performance concurrent multi-path communication for MPI
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Hi-index | 0.00 |
Many existing clusters use inexpensive Gigabit Ethernet and often have multiple interfaces cards to improve bandwidth and enhance fault tolerance. We investigate the use of Concurrent Multipath Transfer (CMT), an extension to the Stream Control Transmission Protocol (SCTP), to take advantage of multiple network interfaces for use with MPI programs. We evaluate the performance of our system with microbenchmarks and MPI collective routines. We also compare our method, which employs CMT at the transport layer in the operating system kernel, to existing systems that support multi-railing in the middleware. We discuss performance with respect to bandwidth, latency, congestion control and fault tolerance.