A fault-tolerant scheduling problem
IEEE Transactions on Software Engineering
Understanding fault-tolerant distributed systems
Communications of the ACM
Semantics with applications: a formal introduction
Semantics with applications: a formal introduction
Fault tolerance in distributed systems
Fault tolerance in distributed systems
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
Optimized rapid prototyping for real-time embedded heterogeneous multiprocessors
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Guest Editorial: A Review of Worst-Case Execution-TimeAnalysis
Real-Time Systems - Special issue on worst-case execution-time analysis
Real-Time Systems: Design Principles for Distributed Embedded Applications
Real-Time Systems: Design Principles for Distributed Embedded Applications
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
The Definition of Standard ML
Transforming Execution-Time Boundable Code into Temporally Predictable Code
DIPES '02 Proceedings of the IFIP 17th World Computer Congress - TC10 Stream on Distributed and Parallel Embedded Systems: Design and Analysis of Distributed Embedded Systems
Synchronous Programming of Reactive Systems
CAV '98 Proceedings of the 10th International Conference on Computer Aided Verification
Automated application-level checkpointing of MPI programs
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems
HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Analysis of checkpointing for schedulability of real-time systems
RTCSA '97 Proceedings of the 4th International Workshop on Real-Time Computing Systems and Applications
Optimal scheduling of imprecise computation tasks in the presence of multiple faults
RTCSA '00 Proceedings of the Seventh International Conference on Real-Time Systems and Applications
System-Level Versus User-Defined Checkpointing
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Hardware to Software Migration with Real-Time Thread Integration
EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 1
A Nonpreemptive Real-Time Scheduler with Recovery from Transient Faults and Its Implementation
IEEE Transactions on Software Engineering
Automated Synthesis of Multitolerance
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Modeling control speculation for timing analysis
Real-Time Systems
Hi-index | 0.00 |
We present a formal approach to implement and certify fault-tolerance in real-time embedded systems. The fault-intolerant initial system consists of a set of independent periodic tasks scheduled onto a set of fail-silent processors. We transform the tasks such that, assuming the availability of an additional spare processor, the system tolerates one failure at a time (transient or permanent). Failure detection is implemented using heartbeating, and failure masking using checkpointing and roll-back. These techniques are described and implemented by automatic program transformations on the tasks' programs. The proposed formal approach to fault-tolerance by program transformation highlights the benefits of separation of concerns and allows us to establish correctness properties.