Realizing a fault-tolerant embedded controller on distributed real-time systems

  • Authors:
  • Junsung Kim;Praful Puranik;Ragunathan (Raj) Rajkumar

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, Pennsylvania;Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India;Carnegie Mellon University, Pittsburgh, Pennsylvania

  • Venue:
  • ACM SIGBED Review - Special Issue on the 5th Workshop on Adaptive and Reconfigurable Embedded Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Advances in real-time, embedded and distributed systems along with control and communication theory have catalyzed the rapid emergence of cyber-physical systems such as a self-driving car. The importance of fault-tolerance support on a cyber-physical system (CPS) has been greatly emphasized by recent research due to the nature of CPS that senses its surroundings, processes sensor data, and reacts using its actuators. In order to tackle this challenge, we proposed SAFER (System-level Architecture for Failure Evasion in Real-time Applications) in our previous work. SAFER is able to tolerate fail-stop processor and/or task failures for distributed embedded real-time systems. One of its limitations, however, is that SAFER is not capable of tolerating a failure of a processor with a dedicated connection to an actuator. This paper provides a method that relaxes this limitation by (1) deploying a small piece of hardware to avoid a dedicated connection between a processor and an actuator, (2) adding a software module that monitors and controls the hardware, and (3) enhancing the failure detection and recovery mechanisms of SAFER to support these changes. The detailed implementation and evaluation of the SAFER extension is on-going work.