Using failure injection mechanisms to experiment and evaluate a grid failure detector

  • Authors:
  • Sébastien Monnet;Marin Bertier

  • Affiliations:
  • IRISA/University of Rennes I;IRISA/INSA

  • Venue:
  • VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

Computing grids are large-scale, highly-distributed, often hierarchical, platforms. At such scales, failures are no longer exceptions, but part of the normal behavior. When designing software for grids, developers have to take failures into account. It is crucial to make experiments at a large scale, with various volatility conditions, in order to measure the impact of failures on the whole system. This paper presents an experimental tool allowing the user to inject failures during a practical evaluation of fault-tolerant systems.We illustrate the usefulness of our tool through an evaluation of a hierarchical grid failure detector.