Reli: hardware/software checkpoint and recovery scheme for embedded processors

  • Authors:
  • Tuo Li;Roshan Ragel;Sri Parameswaran

  • Affiliations:
  • University of New South Wales, Sydney, Australia;University of Peradeniya, Peradeniya, Sri Lanka;University of New South Wales, Sydney, Australia

  • Venue:
  • DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Checkpoint and Recovery (CR) allows computer systems to operate correctly even when compromised by transient faults. While many software systems and hardware systems for CR do exist, they are usually either too large, require major modifications to the software, too slow, or require extensive modifications to the caching schemes. In this paper, we propose a novel error-recovery management scheme, which is based upon re-engineering the instruction set. We take the native instruction set of the processor and enhance the microinstructions with additional micro-operations which enable checkpointing. The recovery mechanism is implemented by three custom instructions, which recover the registers which were changed, the data memory values which were changed and the special registers (PC, status registers etc.) which were changed. Our checkpointing storage is changed according to the benchmark executed. Results show that our method degrades performance by just 1.45% under fault free conditions, and incurs area overhead of 45% on average and 79% in the worst case. The recovery takes just 62 clock cycles (worst case) in the examples which we examined.