Implementing fault-tolerance in real-time programs by automatic program transformations

Authors:
Tolga Ayav;Pascal Fradet;Alain Girault
Affiliations:
INRIA and Izmir Institute of Technology, Turkey;INRIA and University of Grenoble, France;INRIA and University of Grenoble, France
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2008

Citing 28
Cited 2

Supervisory control of a class of discrete event processes

SIAM Journal on Control and Optimization
Fault-Tolerant Computing: Fundamental Concepts

Computer
Understanding fault-tolerant distributed systems

Communications of the ACM
Semantics with applications: a formal introduction

Semantics with applications: a formal introduction
Fault tolerance in distributed systems

Fault tolerance in distributed systems
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
An On-Line Algorithm for Checkpoint Placement

IEEE Transactions on Computers
Optimized rapid prototyping for real-time embedded heterogeneous multiprocessors

CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
Guest Editorial: A Review of Worst-Case Execution-TimeAnalysis

Real-Time Systems - Special issue on worst-case execution-time analysis
Fast and Precise WCET Prediction by Separated Cache andPath Analyses

Real-Time Systems - Special issue on worst-case execution-time analysis
Worst Case Execution Time Analysis for a Processor withBranch Prediction

Real-Time Systems - Special issue on worst-case execution-time analysis
Real-Time Systems: Design Principles for Distributed Embedded Applications

Real-Time Systems: Design Principles for Distributed Embedded Applications
The Definition of Standard ML

The Definition of Standard ML
Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication

WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
Bus Architectures for Safety-Critical Embedded Systems

EMSOFT '01 Proceedings of the First International Workshop on Embedded Software
Transforming Execution-Time Boundable Code into Temporally Predictable Code

DIPES '02 Proceedings of the IFIP 17th World Computer Congress - TC10 Stream on Distributed and Parallel Embedded Systems: Design and Analysis of Distributed Embedded Systems
From Algorithm and Architecture Specifications to Automatic Generation of Distributed Real-Time Executives: a Seamless Flow of Graphs Transformations

MEMOCODE '03 Proceedings of the First ACM and IEEE International Conference on Formal Methods and Models for Co-Design
Optimal scheduling of imprecise computation tasks in the presence of multiple faults

RTCSA '00 Proceedings of the Seventh International Conference on Real-Time Systems and Applications
System-Level Versus User-Defined Checkpointing

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Detectors and Correctors: A Theory of Fault-Tolerance Components

ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Compiler-Assisted Checkpointing

Compiler-Assisted Checkpointing
Hardware to Software Migration with Real-Time Thread Integration

EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 1
A Nonpreemptive Real-Time Scheduler with Recovery from Transient Faults and Its Implementation

IEEE Transactions on Software Engineering
Basic Concepts and Taxonomy of Dependable and Secure Computing

IEEE Transactions on Dependable and Secure Computing
Rate monotonic vs. EDF: judgment day

Real-Time Systems
Modeling Fault-tolerant Distributed Systems for Discrete Controller Synthesis

Electronic Notes in Theoretical Computer Science (ENTCS)

Scheduling and Optimization of Fault-Tolerant Embedded Systems with Transparency/Performance Trade-Offs

ACM Transactions on Embedded Computing Systems (TECS)
Time-Constraint-Aware Optimization of Assertions in Embedded Software

Journal of Electronic Testing: Theory and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a formal approach to implement fault-tolerance in real-time embedded systems. The initial fault-intolerant system consists of a set of independent periodic tasks scheduled onto a set of fail-silent processors connected by a reliable communication network. We transform the tasks such that, assuming the availability of an additional spare processor, the system tolerates one failure at a time (transient or permanent). Failure detection is implemented using heartbeating, and failure masking using checkpointing and rollback. These techniques are described and implemented by automatic program transformations on the tasks' programs. The proposed formal approach to fault-tolerance by program transformations highlights the benefits of separation of concerns. It allows us to establish correctness properties and to compute optimal values of parameters to minimize fault-tolerance overhead. We also present an implementation of our method, to demonstrate its feasibility and its efficiency.