Distributed operating systems
Fault-tolerant computer system design
Fault-tolerant computer system design
Modeling software design diversity: a review
ACM Computing Surveys (CSUR)
A Fault Tolerance Infrastructure for Dependable Computing with High-Performance COTS Components
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Diversity against Accidental and Deliberate Faults
CSDA '98 Proceedings of the Conference on Computer Security, Dependability, and Assurance: From Needs to Solutions
Fault Diversity among Off-The-Shelf SQL Database Servers
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Basic Concepts and Taxonomy of Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing
The N-Version Approach to Fault-Tolerant Software
IEEE Transactions on Software Engineering
Hi-index | 0.89 |
In the hot-standby replication system, the system cannot process its tasks anymore when all replicated nodes have failed. Thus, the remaining living nodes should be well-protected against failure when parts of replicated nodes have failed. Design faults and system-specific weaknesses may cause chain reactions of common faults on identical replicated nodes in replication systems. These can be alleviated by replicating diverse hardware and software. Going one-step forward, failures on the remaining nodes can be suppressed by predicting and preventing the same fault when it has occurred on a replicated node. In this paper, we propose a fault avoidance scheme which increases system dependability by avoiding common faults on remaining nodes when parts of nodes fail, and analyze the system dependability.