Online Diagnosis and Recovery: On the Choice and Impact of Tuning Parameters
IEEE Transactions on Dependable and Secure Computing
Role-Based Symmetry Reduction of Fault-Tolerant Distributed Protocols with Language Support
ICFEM '09 Proceedings of the 11th International Conference on Formal Engineering Methods: Formal Methods and Software Engineering
Hi-index | 0.00 |
We present a tunable diagnostic protocol for generic time-triggered (TT) systems to detect crash and send/receive omission faults. Compared to existing diagnostic and membership protocols for TT systems, it does not rely on the single-fault assumption and tolerates malicious faults. It runs at the application level and can be added on top of any TT system (possibly as a middleware component) without requiring modifications at the system level. The information on detected faults is accumulated using a penalty/reward algorithm to handle transient faults. After a fault is detected, the likelihood of node isolation can be adapted to different system configurations, including those where functions with different criticality levels are integrated. Using actual automotive and aerospace parameters, we experimentally demonstrate the transient fault handling capabilities of the protocol.