Dependability Analysis of a Fault-Tolerant Network Reconfiguring Strategy

  • Authors:
  • Vicente Chirivella;Rosa Alcover;José Flich;José Duato

  • Affiliations:
  • Department of Statistics and Operation Research,;Department of Statistics and Operation Research,;Department of Information Systems and Computer Architecture, Universidad Politécnica de Valencia, Valencia, Spain 46022;Department of Information Systems and Computer Architecture, Universidad Politécnica de Valencia, Valencia, Spain 46022

  • Venue:
  • Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fault tolerance mechanisms become indispensable as the number of processors increases in large systems. Measuring the effectiveness of such mechanisms before its implementation becomes mandatory. Research toward understanding the effects of different network parameters on the dependability parameters, like mean time to network failure or availability, becomes necessary. In this paper we analyse in detail such effects with a methodology proposed previously by us. This methodology is based on Markov chains and Analysis of Variance techniques. As a case study we analyse the effects of network size, mean time to node failure, mean time to node repair, mean time to network repair and coverage of the failure when using a 2D mesh network with a fault-tolerant mechanism (similar to the one used in the BlueGene/L system), that is able to remove rows and/or columns in the presence of failures.