The Dynamics of Performance Collapse in Large-Scale Networks and Computers

  • Authors:
  • Neil J. Gunther

  • Affiliations:
  • Performance Dynamics Consulting, Castro Valley, California

  • Venue:
  • International Journal of High Performance Computing Applications
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Failures in communication networks have become all too familiar. The AT&T phone system was brought to its knees in 1990 and their frame-relay network shut down west coast bank ATMs in 1998. These were hard network failures. Less obvious is the congestive failure of packet networks even without hard errors. This spontaneous collapse in performance is observed as orders-of-magnitude drop in packets/second delivered or increased packet delay. Such effects were seen on the Internet in 1986 and led to the implementation of the TCP/IP "slow-start" congestion avoidance algorithm. That same algorithm is now responsible for latency overhead in HTTP traffic on the World Wide Web. This paper outlines an approach developed by the author based on the surprising observation that the degree of instability in computer networks is logically equivalent to estimating the rate of decay in an unstable (radioactive) atom and suggests the application of the Feynman path integral from quantum mechanics. The advantage of this approach is threefold: (a) it makes the dynamics of large transients intuitively clear, (b) it furnishes an estimator for the mean time to collapse, and (c) it provides corrections to other estimators (e.g., Catastrophe Theory and the Theory of Large Deviations).