Design & analysis of fault tolerant digital systems
Design & analysis of fault tolerant digital systems
Fault tolerance in distributed systems
Fault tolerance in distributed systems
Fault-Tolerant Software for Real-Time Applications
ACM Computing Surveys (CSUR)
A fuzzy reasoning design for fault detection and diagnosis of a computer-controlled system
Engineering Applications of Artificial Intelligence
Formal specification and analysis of accelerated heartbeat protocols
Proceedings of the 2010 Summer Computer Simulation Conference
Hi-index | 0.00 |
Fault tolerance is considered as the ideal candidate not only for the failsafe system, but also for the reduction of the failure effect and the continuation of the remaining task. The proposed fault-tolerant architecture includes the software design of error detection and diagnosis, as well as the error recovery. Multi-tasks are managed by a computer with a redundant one or by multi-computers with redundancy from each other are employed and evaluated in terms of the reliability and effectiveness. The executing program is supervised by the watchdog, which warns a failure condition of the software program in case that the execution time of each subprogram runs over its default value. The computers are mutually sending the heartbeat signals periodically. The message of the receiving signal indicates whether the system is under failure. The entire detection of the heartbeat function is supervised by the time daemon to ensure that the fault recovery is feasible. Once a computer is failed, the other computer immediately takes over its position and accomplishes the remaining task.