Efficient fault tolerance in multi-media applications through selective instruction replication

  • Authors:
  • Ayswarya Sundaram;Ameen Aakel;Derek Lockhart;Darshan Thaker;Diana Franklin

  • Affiliations:
  • Cal Poly State Univ, SLO, CA, USA;Cal Poly State Univ, SLO, CA, USA;Cornell University, Ithica, NY, USA;UC Davis, Davis, USA;UCSB, Santa Barbara, USA

  • Venue:
  • Proceedings of the 2008 workshop on Radiation effects and fault tolerance in nanometer technologies
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

As voltages decrease, soft errors are expected to become an increasing problem in maintaining program correctness. Unfortunately, previous mechanisms to improve processor reliability protect all processor instructions equally, causing such approaches to suffer from significant performance degradation and/or substantial hardware overhead. However, recent research has shown that in multimedia applications such as photography, video, and audio, not all instructions are created equal: many operations prove to be far more tolerant to faults than others [1]. This observation can be leveraged to limit the cost of reliable computing by protecting only those instructions that are critical to correct execution. We propose a mechanism to protect against soft errors through selective instruction replication. We begin with a dynamic instruction replication framework that replicates every instruction and checks them upon commit, rolling back for any inconsistent results. Instead of replicating the entire program, instructions that the compiler identifies as tolerant to error would remain unprotected. While full replication requires 40% to 100% overhead, our mechanism requires only 30% to 75% overhead, reducing the overhead by 15-33% with minimal hardware overhead. We suffer only 0.5 - 1% fidelity degradation with this approach.