Supervisory control of a class of discrete event processes
SIAM Journal on Control and Optimization
Understanding fault-tolerant distributed systems
Communications of the ACM
Semantics with applications: a formal introduction
Semantics with applications: a formal introduction
Fault tolerance in distributed systems
Fault tolerance in distributed systems
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
An On-Line Algorithm for Checkpoint Placement
IEEE Transactions on Computers
Optimized rapid prototyping for real-time embedded heterogeneous multiprocessors
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment
Journal of the ACM (JACM)
Guest Editorial: A Review of Worst-Case Execution-TimeAnalysis
Real-Time Systems - Special issue on worst-case execution-time analysis
Fast and Precise WCET Prediction by Separated Cache andPath Analyses
Real-Time Systems - Special issue on worst-case execution-time analysis
Worst Case Execution Time Analysis for a Processor withBranch Prediction
Real-Time Systems - Special issue on worst-case execution-time analysis
Real-Time Systems: Design Principles for Distributed Embedded Applications
Real-Time Systems: Design Principles for Distributed Embedded Applications
The Definition of Standard ML
Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication
WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
Bus Architectures for Safety-Critical Embedded Systems
EMSOFT '01 Proceedings of the First International Workshop on Embedded Software
Transforming Execution-Time Boundable Code into Temporally Predictable Code
DIPES '02 Proceedings of the IFIP 17th World Computer Congress - TC10 Stream on Distributed and Parallel Embedded Systems: Design and Analysis of Distributed Embedded Systems
MEMOCODE '03 Proceedings of the First ACM and IEEE International Conference on Formal Methods and Models for Co-Design
Optimal scheduling of imprecise computation tasks in the presence of multiple faults
RTCSA '00 Proceedings of the Seventh International Conference on Real-Time Systems and Applications
System-Level Versus User-Defined Checkpointing
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Detectors and Correctors: A Theory of Fault-Tolerance Components
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Compiler-Assisted Checkpointing
Compiler-Assisted Checkpointing
Hardware to Software Migration with Real-Time Thread Integration
EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 1
A Nonpreemptive Real-Time Scheduler with Recovery from Transient Faults and Its Implementation
IEEE Transactions on Software Engineering
Basic Concepts and Taxonomy of Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing
Rate monotonic vs. EDF: judgment day
Real-Time Systems
Modeling Fault-tolerant Distributed Systems for Discrete Controller Synthesis
Electronic Notes in Theoretical Computer Science (ENTCS)
ACM Transactions on Embedded Computing Systems (TECS)
Time-Constraint-Aware Optimization of Assertions in Embedded Software
Journal of Electronic Testing: Theory and Applications
Hi-index | 0.00 |
We present a formal approach to implement fault-tolerance in real-time embedded systems. The initial fault-intolerant system consists of a set of independent periodic tasks scheduled onto a set of fail-silent processors connected by a reliable communication network. We transform the tasks such that, assuming the availability of an additional spare processor, the system tolerates one failure at a time (transient or permanent). Failure detection is implemented using heartbeating, and failure masking using checkpointing and rollback. These techniques are described and implemented by automatic program transformations on the tasks' programs. The proposed formal approach to fault-tolerance by program transformations highlights the benefits of separation of concerns. It allows us to establish correctness properties and to compute optimal values of parameters to minimize fault-tolerance overhead. We also present an implementation of our method, to demonstrate its feasibility and its efficiency.