A Framework for Node-Level Fault Tolerance in Distributed Real-Time Systems

  • Authors:
  • Joakim Aidemark;Johan Karlsson

  • Affiliations:
  • Volvo Car Corporation;Chalmers University of Technology

  • Venue:
  • DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a framework for achieving node-level fault tolerance (NLFT) in distributed real-time systems. The objective of NLFT is to mask errors at the node level in order to reduce the probability of node failures and thereby improve system dependability. We describe an approach called light-weight NLFT where transient faults are masked locally in the nodes by time-redundant execution of application tasks. The advantages of light-weight NLFT is demonstrated by a reliability analysis of an example brake-by-wire architecture. The results show that the use of light-weight NLFT may provide 55% higher reliability after one year and almost 60% higher MTTF, compared to using fail-silent nodes.