Improving the reliability of internet paths with one-hop source routing

  • Authors:
  • Krishna P. Gummadi;Harsha V. Madhyastha;Steven D. Gribble;Henry M. Levy;David Wetherall

  • Affiliations:
  • Department of Computer Science & Engineering, University of Washington;Department of Computer Science & Engineering, University of Washington;Department of Computer Science & Engineering, University of Washington;Department of Computer Science & Engineering, University of Washington;Department of Computer Science & Engineering, University of Washington

  • Venue:
  • OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
  • Year:
  • 2004

Quantified Score

Hi-index 0.02

Visualization

Abstract

Recent work has focused on increasing availability in the face of Internet path failures. To date, proposed solutions have relied on complex routing and path-monitoring schemes, trading scalability for availability among a relatively small set of hosts. This paper proposes a simple, scalable approach to recover from Internet path failures. Our contributions are threefold. First, we conduct a broad measurement study of Internet path failures on a collection of 3,153 Internet destinations consisting of popular Web servers, broad-band hosts, and randomly selected nodes. We monitored these destinations from 67 PlanetLab vantage points over a period of seven days, and found availabilities ranging from 99.6% for servers to 94.4% for broadband hosts. When failures do occur, many appear too close to the destination (e.g., last-hop and end-host failures) to be mitigated through alternative routing techniques of any kind. Second, we show that for the failures that can be addressed through routing, a simple, scalable technique, called one-hop source routing, can achieve close to the maximum benefit available with very low overhead. When a path failure occurs, our scheme attempts to recover from it by routing indirectly through a small set of randomly chosen intermediaries. Third, we implemented and deployed a prototype one-hop source routing infrastructure on PlanetLab. Over a three day period, we repeatedly fetched documents from 982 popular Internet Web servers and used one-hop source routing to attempt to route around the failures we observed. Our results show that our prototype successfully recovered from 56% of network failures. However, we also found a large number of server failures that cannot be addressed through alternative routing. Our research demonstrates that one-hop source routing is easy to implement, adds negligible overhead, and achieves close to the maximum benefit available to indirect routing schemes, without the need for path monitoring, history, or a-priori knowledge of any kind.