A combined evaluation of performance and reliability for degradable systems

Authors:
Ragnar Huslende
Affiliations:
Electronics Research Laboratory, Norwegian Institute of Technology, Univesity of Trondheim, 7034 Trondheim-NTH, Norway
Venue:
SIGMETRICS '81 Proceedings of the 1981 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Year:
1981

Citing 1
Cited 11

Fundamentals of queueing theory (2nd ed.).

Fundamentals of queueing theory (2nd ed.).

Analysis of a composite performance reliability measure for fault-tolerant systems

Journal of the ACM (JACM)
Evaluation of Performability for Degradable Computer Systems

IEEE Transactions on Computers
Performability Analysis: Measures, an Algorithm, and a Case Study

IEEE Transactions on Computers - Fault-Tolerant Computing
Performability Analysis of Distributed Real-Time Systems

IEEE Transactions on Computers
On Evaluating the Cumulative Performance Distribution of Fault-Tolerant Computer Systems

IEEE Transactions on Computers
Degradable computer systems with dependent subsystems

SIGMETRICS '85 Proceedings of the 1985 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A Performability Solution Method for Degradable Nonrepairable Systems

IEEE Transactions on Computers
Closed-Form Solutions of Performability

IEEE Transactions on Computers
Reliability and survivability methodologies for next generation networks

Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia
Performability analysis of storage systems in practice: methodology and tools

ISAS'06 Proceedings of the Third international conference on Service Availability
A decomposition approach for the stationary analysis of fault tolerant queueing systems

Journal of Systems and Software

Quantified Score

Hi-index	0.01

Visualization

Abstract

As the field of fault-tolerant computing is maturing and results from this field are taken into practical use the effects of a failure in a computer system need not be catastrophic. With good fault-detection mechanisms it is now possible to cover a very high percentage of all the possible failures that can occur. Once a fault is detected, systems are designed to reconfigure and proceed either with full or degraded performance depending on how much redundancy is built into the system. It should be noted that one particular failure may have different effects depending on the circumstances and the time at which it occurs. Today we see that large numbers of resources are being tied together in complex computer systems, either locally or in geographically distributed systems and networks. In such systems it is obviously very undesirable that the failure of one element can bring the entire system down. On the other hand one can usually not afford to design the system with sufficient redundancy to mask the effect of all failures immediately.