A light-weight cache-based fault detection and checkpointing scheme for MPSoCs enabling relaxed execution synchronization

  • Authors:
  • Chengmo Yang;Alex Orailoglu

  • Affiliations:
  • UC San Diego, San Diego, CA, USA;UC San Diego, San Diego, CA, USA

  • Venue:
  • CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

While technology advances have made MPSoCs a standard architecture for embedded systems, their applicability is increasingly being challenged by dramatic increases in the amount of device failures that may occur during execution. Conventional fault tolerance techniques employ a duplication-and-comparison strategy to detect arbitrary execution faults, as well as a checkpointing-and-rollback strategy to recover from the faulty state. Comparison and checkpointing are performed either at task level, thus imposing a large amount of overhead in verifying and backing up memory pages, or at instruction level, thus necessitating a lock-step execution model which significantly limits the attainable performance. To overcome the shortcomings of both strategies, in this paper we propose a cache-based fault tolerance scheme wherein the comparison and checkpointing process is performed at the cache-memory interface. By allowing two processors that execute duplicated tasks to share a single data cache, the proposed scheme is able to verify execution results before writing them back into memory, thus protecting the memory from being polluted by execution faults. This in turn significantly reduces the checkpointing overhead. Meanwhile, since only the data written into memory are compared, the strict instruction-by-instruction synchronization model used in multithreading processors can be relaxed. The simulation results confirm that the proposed scheme only imposes a performance overhead ranging from 1.4% to 10.4%, while both fault detection and execution checkpointing can be effectively attained.