An adaptive approach to achieving hardware and software fault tolerance in a distributed computing environment

  • Authors:
  • A. Bondavalli;S. Chiaradonna;F. Di Giandomenico;J. Xu

  • Affiliations:
  • University of Firenze, Firenze, Italy;CNUCE/CNR, Pisa, Italy;IEI/CNR, Pisa, Italy;University of Durham, Durham, UK

  • Venue:
  • Journal of Systems Architecture: the EUROMICRO Journal
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper focuses on the problem of providing tolerance to both hardware and software faults in independent applications running on a distributed computing environment. Several hybrid-fault-tolerant architectures are identified and proposed. Given the highly varying and dynamic characteristics of the operating environment, solutions are developed mainly exploiting the adaptation property. They are based on the adaptive execution of redundant programs so as to minimise hardware resource consumption and to shorten response time, as much as possible, for a required level of fault tolerance. A method is introduced for evaluating the proposed architectures with respect to reliability, resource utilisation and response time. Examples of quantitative evaluations are also given.