Performance Sensitivity of Routing Algorithms to Failures in Networks of Worksations

  • Authors:
  • Javier Molero;Federico Silla;Vicente Santonja;José Duato

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Networks of workstations (NOWs) are becoming an increasingly popular alternative to parallel computers for those applications with high needs of resources such as memory capacity and input/output storage space, and also for small scale parallel computing. Although the mean time between failures (MTBF) for individual links and switches in a NOW is very high, the probability of a failure occurrence dramatically increases as the network size becomes larger. Moreover, there are external factors, such as accidental link disconnections, that also can affect the overall NOW reliability. Until the faulty element is replaced, the NOW is functioning in a degraded mode. Thus, it becomes necessary to quantify how much the global NOW performance is reduced during the time the system remains in this state. In this paper we analyze the performance degradation of networks of workstations when failures in links or switches occur. Because the routing algorithm is a key issue in the design of a NOW, we quantify the sensitivity to failures of two routing algorithms: up*/down* and minimal adaptive routing algorithms. Simulation results show that, in general, up*/down* routing is highly robust to failures. On the other hand, the minimal adaptive routing algorithm presents a better performance, even in the presence of failures, but at the expense of a larger sensitivity.