End-to-end fault tolerance using transport layer multihoming

  • Authors:
  • Armando L. Caro, Jr.;Paul D. Amer

  • Affiliations:
  • University of Delaware;University of Delaware

  • Venue:
  • End-to-end fault tolerance using transport layer multihoming
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This dissertation investigates the use of transport layer multihoming for providing end-to-end network fault tolerance and improved application performance. Transport layer multihoming is a feature that binds a single transport layer association to multiple network addresses at each endpoint, thus allowing the two end hosts to communicate over multiple network paths. Such path redundancy is useful for fault tolerance in that traffic of existing connections can be redirected (i.e., failover) to a peer's alternate network address without the need for applications (or users) to abort and re-establish connections. Considering the prevalence of path outages in the Internet today, multihoming support at the transport layer can improve resilience of established connections. Using the Stream Control Transmission Protocol (SCTP), we investigate possible design decisions of a multihomed transport protocol, and provide insight for future transport protocols that support multihoming. In particular, we investigate retransmission policies and failover mechanisms in two contexts: proactive (for fixed infrastructure networks), and reactive routing (for mobile ad-hoc networks) protocols. Retransmission policies control the behavior when a transport sender fails to receive acks for sent data. Failover mechanisms determine under which conditions a path is presumed failed, when a sender migrates to a new path, and if/when a sender resumes new data transmission on the original path. We provide a decision tree to suggest a retransmission policy and failover mechanism based on expected network conditions. Our results have uncovered an important design principle for multihomed transport protocols: traditional conservative failover techniques used in routing do not apply when path redundancy begins at the end hosts and is handled by the transport layer. Since failovers at the routing layer are transparent to the transport layer, the failover thresholds must be conservative to avoid oscillations that could cause the transport layer to maintain inaccurate path metrics (RTT, cwnd, ssthresh). On the other hand, a multihomed transport layer is completely aware of failover events and is able to maintain separate metrics per path. As a result, transport layer multihoming can improve performance by providing aggressive failovers that reduce stalls during network congestion and failure events.