Performance and reliability analysis of computer systems: an example-based approach using the SHARPE software package
IEEE Transactions on Computers
On Design of Fail-Safe Cellular Arrays
ATS '96 Proceedings of the 5th Asian Test Symposium
FTDCS '97 Proceedings of the 6th IEEE Workshop on Future Trends of Distributed Computing Systems
Reliability Analysis of Clustered Computing Systems
ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
Hi-index | 0.00 |
In this paper, we investigate the availability requirement for the fault management server in high-availability communication systems. According to our study, we find that the availability of the fault management server does not need to be 99.999% in order to guarantee a 99.999% system availability as long as the fail-safe ratio (the probability that the failure of the fault management server will not bring the system down) and the fault coverage ratio (the probability that the failure in the system can be detected and recovered by the fault management server) are sufficiently high. Tradeoffs can be made among the availability of the fault management server, the fail-safe ratio and the fault coverage ratio to optimize system availability. A cost-effective design for the fault management server is proposed in this paper.