Fundamentals of queueing theory (2nd ed.).
Fundamentals of queueing theory (2nd ed.).
Analysis of a composite performance reliability measure for fault-tolerant systems
Journal of the ACM (JACM)
Evaluation of Performability for Degradable Computer Systems
IEEE Transactions on Computers
Performability Analysis: Measures, an Algorithm, and a Case Study
IEEE Transactions on Computers - Fault-Tolerant Computing
Performability Analysis of Distributed Real-Time Systems
IEEE Transactions on Computers
On Evaluating the Cumulative Performance Distribution of Fault-Tolerant Computer Systems
IEEE Transactions on Computers
Degradable computer systems with dependent subsystems
SIGMETRICS '85 Proceedings of the 1985 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A Performability Solution Method for Degradable Nonrepairable Systems
IEEE Transactions on Computers
Closed-Form Solutions of Performability
IEEE Transactions on Computers
Reliability and survivability methodologies for next generation networks
Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia
Performability analysis of storage systems in practice: methodology and tools
ISAS'06 Proceedings of the Third international conference on Service Availability
A decomposition approach for the stationary analysis of fault tolerant queueing systems
Journal of Systems and Software
Hi-index | 0.01 |
As the field of fault-tolerant computing is maturing and results from this field are taken into practical use the effects of a failure in a computer system need not be catastrophic. With good fault-detection mechanisms it is now possible to cover a very high percentage of all the possible failures that can occur. Once a fault is detected, systems are designed to reconfigure and proceed either with full or degraded performance depending on how much redundancy is built into the system. It should be noted that one particular failure may have different effects depending on the circumstances and the time at which it occurs. Today we see that large numbers of resources are being tied together in complex computer systems, either locally or in geographically distributed systems and networks. In such systems it is obviously very undesirable that the failure of one element can bring the entire system down. On the other hand one can usually not afford to design the system with sufficient redundancy to mask the effect of all failures immediately.