Fault-tolerant typed assembly language

Authors:
Frances Perry;Lester Mackey;George A. Reis;Jay Ligatti;David I. August;David Walker
Affiliations:
Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;University of South Florida, Tampa, FL;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ
Venue:
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Year:
2007

Citing 19
Cited 6

Field testing for cosmic ray soft errors in semiconductor memories

IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
From system F to typed assembly language

ACM Transactions on Programming Languages and Systems (TOPLAS)
Multiple instruction issue in the NonStop cyclone processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Transient fault detection via simultaneous multithreading

Proceedings of the 27th annual international symposium on Computer architecture
ED4I: Error Detection by Diverse Data and Duplicated Instructions

IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Transient-fault recovery using simultaneous multithreading

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
IBM's S/390 G5 Microprocessor Design

IEEE Micro
Concurrent Error Detection Using Watchdog Processors-A Survey

IEEE Transactions on Computers
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Transient-fault recovery for chip multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Implicit Signature Checking

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Compiling with proofs

Compiling with proofs
SWIFT: Software Implemented Fault Tolerance

Proceedings of the international symposium on Code generation and optimization
Design and Evaluation of Hybrid Fault-Detection Systems

Proceedings of the 32nd annual international symposium on Computer Architecture
Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation

IEEE Micro
A framework for unrestricted whole-program optimization

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Static typing for a faulty lambda calculus

Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming
Automatic Instruction-Level Software-Only Recovery

IEEE Micro

Reasoning about Control Flow in the Presence of Transient Faults

SAS '08 Proceedings of the 15th international symposium on Static Analysis
ESoftCheck: Removal of Non-vital Checks for Fault Tolerance

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
EnerJ: approximate data types for safe and general low-power computation

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Faulty logic: reasoning about fault tolerant programs

ESOP'10 Proceedings of the 19th European conference on Programming Languages and Systems
Verifying quantitative reliability for programs that execute on unreliable hardware

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Improving the fault resilience of an H.264 decoder using static analysis methods

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10

Quantified Score

Hi-index	0.00

Visualization

Abstract

A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. Although transient faults do not permanently damage the hardware, they may corrupt computations by altering stored values and signal transfers. In this paper, we propose a new scheme for provably safe and reliable computing in the presence of transient hardware faults. In our scheme, software computations are replicated to provide redundancy while special instructions compare the independently computed results to detect errors before writing critical data. In stark contrast to any previous efforts in this area, we have analyzed our fault tolerance scheme from a formal, theoretical perspective. To be specific, first, we provide an operational semantics for our assembly language, which includes a precise formal definition of our fault model. Second, we develop an assembly-level type system designed to detect reliability problems in compiled code. Third, we provide a formal specification for program fault tolerance under the given fault model and prove that all well-typed programs are indeed fault tolerant. In addition to the formal analysis, we evaluate our detection scheme and show that it only takes 34% longer to execute than the unreliable version.