Algebra-based scalable overlay network monitoring: algorithms, evaluation, and applications

  • Authors:
  • Yan Chen;David Bindel;Han Hee Song;Randy H. Katz

  • Affiliations:
  • Northwestern University, Technical Institute, Evanston, IL;Department of Mathematics, Courant Institute of Mathematical, Sciences, New York University, New York, NY;University of Texas at Austin, TX;University of California at Berkeley, Berkeley, CA

  • Venue:
  • IEEE/ACM Transactions on Networking (TON)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with n end hosts, existing systems either require O(n2) measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Our earlier extended abstract [Y. Chen, D. Bindel, and R. H. Katz, "Tomography-based overlay network monitoring," Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC), 2003] briefly proposes an algebraic approach that selectively monitors k linearly independent paths that can fully describe all the O(n2) paths. The loss rates and latency of these k paths can be used to estimate the loss rates and latency of all other paths. Our scheme only assumes knowledge of the underlying IP topology, with links dynamically varying between lossy and normal. In this paper, we improve, implement, and extensively evaluate such a monitoring system. We further make the following contributions: i) scalability analysis indicating that for reasonably large n (e.g., 100), the growth of k is bounded as O(n log n), ii) efficient adaptation algorithms for topology changes, such as the addition or removal of end hosts and routing changes, iii) measurement load balancing schemes, iv) topology measurement error handling, and v) design and implementation of an adaptive streaming media system as a representative application. Both simulation and Internet experiments demonstrate we obtain highly accurate path loss rate estimation while adapting to topology changes within seconds and handling topology errors.