COFTA: Hardware-Software Co-Synthesis of Heterogeneous Distributed Embedded Systems for Low Overhead Fault Tolerance

  • Authors:
  • Bharat P. Dave;Niraj K. Jha

  • Affiliations:
  • Bell Laboratories, Holmdel, NJ;Princeton Univ., Princeton, NJ

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 1999

Quantified Score

Hi-index 14.98

Visualization

Abstract

Embedded systems employed in critical applications demand high reliability and availability in addition to high performance. Hardware-software co-synthesis of an embedded system is the process of partitioning, mapping, and scheduling its specification into hardware and software modules to meet performance, cost, reliability, and availability goals. In this paper, we address the problem of hardware-software co-synthesis of fault-tolerant real-time heterogeneous distributed embedded systems. Fault detection capability is imparted to the embedded system by adding assertion and duplicate-and-compare tasks to the task graph specification prior to co-synthesis. The dependability (reliability and availability) of the architecture is evaluated during co-synthesis. Our algorithm, called COFTA (Co-synthesis Of Fault-Tolerant Architectures), allows the user to specify multiple types of assertions for each task. It uses the assertion or combination of assertions which achieves the required fault coverage without incurring too much overhead. We propose new methods to: 1) Perform fault tolerance based task clustering, which determines the best placement of assertion and duplicate-and-compare tasks, 2) Derive the best error recovery topology using a small number of extra processing elements, 3) Exploit multidimensional assertions, and 4) Share assertions to reduce the fault tolerance overhead. Our algorithm can tackle multirate systems commonly found in multimedia applications. Application of the proposed algorithm to a large number of real-life telecom transport system examples (the largest example consisting of 2,172 tasks) shows its efficacy. For fault-secure architectures, which just have fault detection capabilities, COFTA is able to achieve up to 48.8 percent and 25.6 percent savings in embedded system cost over architectures employing duplication and task-based fault tolerance techniques, respectively. The average cost overhead of COFTA fault-secure architectures over simplex architectures is only 7.3 percent. In case of fault-tolerant architectures, which cannot only detect but also tolerate faults, COFTA is able to achieve up to 63.1 percent and 23.8 percent savings in embedded system cost over architectures employing triple-modular redundancy, and task-based fault tolerance techniques, respectively. The average cost overhead of COFTA fault-tolerant architectures over simplex architectures is only 55.4 percent.