Recovery in Fault-Tolerant Distributed Microcontrollers

  • Authors:
  • David A. Rennels;Riki Hwang

  • Affiliations:
  • -;-

  • Venue:
  • DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: This paper describes the use of fault-tolerance in a microcontroller node to be used in a network of embedded processors. It is primarily motivated by long-life space applications where radiation-induced transient errors will be a frequent occurrence, and a few chip failures may be expected before a mission is completed. A testbed has been constructed, and a real-time executive has been developed and tested in it. Preliminary fault-insertion testing has been started. Due to interconnection constraints for latchup circumvention and other reasons, we have chosen a design that is not Byzantine resilient. Even though inconsistent signaling may occur occasionally, multiple recovery actions must converge to a successful testing and restart of the system to regain correct functionality.