Using explicit output comparisons for fault tolerant scheduling (FTS) on modern high-performance processors

  • Authors:
  • Yue Gao;Sandeep K. Gupta;Melvin A. Breuer

  • Affiliations:
  • University of Southern California, Los Angeles;University of Southern California, Los Angeles;University of Southern California, Los Angeles

  • Venue:
  • Proceedings of the Conference on Design, Automation and Test in Europe
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Soft errors and errors caused by intermittent faults are a major concern for modern processors. In this paper we provide a drastically different approach for fault tolerant scheduling (FTS) of tasks in such processors. Traditionally in FTS, error detection is performed implicitly and concurrently with task execution, and associated overheads are incurred as increases in software run-time or hardware area. However, such embedded error detection (EED) techniques, e.g., watchdog processor assisted control flow checking, only provide approximately 70% error coverage [1, 2]. We propose the idea of utilizing straightforward explicit output comparison (EOC) which provides nearly 100% error coverage. We construct a framework for utilizing EOC in FTS, identify new challenges and tradeoffs, and develop a new off-line scheduling algorithm for EOC. We show that our EOC based approach provides higher error coverage and an average performance improvement of nearly 10% over EED-based FTS approaches, without increasing resource requirements. In our ongoing research we are identifying a richer set of ways of applying EOC, by itself and in conjunction with EED, to obtain further improvements.