Discriminating Fault Rate and Persistency to Improve Fault Treatment

  • Authors:
  • A. Bondavalli;S. Chiaradonna;F. Di Giandomenico;F. Grandoni

  • Affiliations:
  • -;-;-;-

  • Venue:
  • FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper the consolidate identification of faults, distinguished as transient or permanent/intermittent, is approached. Transient faults discrimination has long been performed in commercial systems: threshold-based techniques have been practiced for several years for this purpose. The present work aims to contribute to the usefulness of the count-and-threshold scheme, through the analysis of its behavior and the exploration of its effects on the system. To this goal, the scheme is mechanized as a device named a-count, endowed with a few controllable parameters. a-count tries to balance between two conflicting requirements: to keep in the system those components that have experienced just transient faults; and to remove quickly those affected by permanent or intermittent faults. Analytical models are derived, allowing detailed study of a-count's behaviour; the actual evaluation, in a range of configurations, is performed by standard tools, in terms of the delay in spotting faulty components and the probability of improperly blaming correct ones.