FT-Grid: a system for achieving fault tolerance in grids

  • Authors:
  • Jie Xu;Paul Townend;Nik Looker;Paul Groth

  • Affiliations:
  • School of Computing, University of Leeds, Leeds LS2 9JT, U.K.;School of Computing, University of Leeds, Leeds LS2 9JT, U.K.;School of Computing, University of Leeds, Leeds LS2 9JT, U.K.;School of Electronics and Computer Science, University of Southampton, Southampton S017 1B, U.K.

  • Venue:
  • Concurrency and Computation: Practice & Experience - Selected Papers from the 2005 U.K. e-Science All Hands Meeting (AHM 2005)
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The FT-Grid system introduces a fault-tolerance framework that allows faults occurring in service-oriented systems to be tolerated, thus increasing the dependability of such systems. This paper presents the design, development and evaluation of FT-Grid. We show empirical evidence of the dependability benefits offered by FT-Grid by performing an experimental dependability analysis using fault-injection testing performed with the WS-FIT tool. We then illustrate a potential problem with voting-based fault-tolerance schemes in the service-oriented paradigm—namely that individual channels within a fault-tolerant system, supposed to be independent of each other, may in fact invoke common services as part of their workflow, thus increasing the potential for common-mode failure of those channels. We propose a solution to this issue by using the technique of provenance to provide FT-Grid with topological awareness. We implement a large experimental system, and—with the use of the Provenance Recording for Services system developed as part of the PASOA project at the University of Southampton—perform a large number of experiments that show that a topologically aware FT-Grid system serves as a much more dependable system than any other configuration tested, while imposing a negligible timing overhead. Copyright © 2007 John Wiley & Sons, Ltd.