(N, K) Concept Fault Tolerance
IEEE Transactions on Computers - The MIT Press scientific computation series
Measurement and Application of Fault Latency
IEEE Transactions on Computers - The MIT Press scientific computation series
Modeling and analysis of computer system availability
IBM Journal of Research and Development
Modulo 3 Residue Checker: New Results on Performance and Cost
IEEE Transactions on Computers
Survey of software tools for evaluating reliability, availability, and serviceability
ACM Computing Surveys (CSUR)
Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems
IEEE Transactions on Computers
Fault Injection for Dependability Validation: A Methodology and Some Applications
IEEE Transactions on Software Engineering
X-Ware Reliability and Availability Modeling
IEEE Transactions on Software Engineering
A Unified Framework for Simulating Markovian Models of Highly Dependable Systems
IEEE Transactions on Computers
Detailed Modeling and Reliability Analysis of Fault-Tolerant Processor Arrays
IEEE Transactions on Computers
Estimators for Fault Tolerance Coverage Evaluation
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Coverage estimation using statistic of the extremes for when testing reveals no failures: 3
IEEE Transactions on Computers
Coverage Estimation Using Statistics of the Extremes for When Testing Reveals No Failures
IEEE Transactions on Computers
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems
IEEE Transactions on Computers
Design of fault-tolerant associative processors
ISCA '73 Proceedings of the 1st annual symposium on Computer architecture
Fault-tolerance and fault-intolerance: Complementary approaches to reliable computing
Proceedings of the international conference on Reliable software
Basic Concepts and Taxonomy of Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing
Robustness Testing of Java Server Applications
IEEE Transactions on Software Engineering
A Hardware Redundancy Reconfiguration Scheme for Tolerating Multiple Module Failures
IEEE Transactions on Computers
The Concept of Coverage and Its Effect on the Reliability Model of a Repairable System
IEEE Transactions on Computers
Reliability Modeling of Compensating Module Failures in Majority Voted Redundancy
IEEE Transactions on Computers
Performance-Related Reliability Measures for Computing Systems
IEEE Transactions on Computers
IEEE Transactions on Computers
Evaluation of Maintenance Software in Real-Time Systems
IEEE Transactions on Computers
Performability Evaluation of the SIFT Computer
IEEE Transactions on Computers
A Highly Efficient Redundancy Scheme: Self-Purging Redundancy
IEEE Transactions on Computers
Computation-Based Reliability Analysis
IEEE Transactions on Computers
Derivation and Calibration of a Transient Error Reliability Model
IEEE Transactions on Computers
Closed-Form Solutions of Performability
IEEE Transactions on Computers
Reliability Analysis of Systems with Concurrent Error Detection
IEEE Transactions on Computers
On Evaluating the Performability of Degradable Computing Systems
IEEE Transactions on Computers
Automatic Generation of Symbolic Reliability Functions for Processor-Memory-Switch Structures
IEEE Transactions on Computers - Lecture notes in computer science Vol. 174
IEEE Transactions on Computers
Ultrahigh Reliability Prediction for Fault-Tolerant Computer Systems
IEEE Transactions on Computers
Dynamic confirmation of system integrity
AFIPS '72 (Fall, part I) Proceedings of the December 5-7, 1972, fall joint computer conference, part I
Approaches to computer reliability: then and now
AFIPS '76 Proceedings of the June 7-10, 1976, national computer conference and exposition
A study of fault tolerance techniques for associative processors
AFIPS '74 Proceedings of the May 6-10, 1974, national computer conference and exposition
Fault-Tolerant Computers Using ``Dotted Logic'' Redundancy Techniques
IEEE Transactions on Computers
A token based approach detecting downtime in distributed application servers or network elements
EUNICE'10 Proceedings of the 16th EUNICE/IFIP WG 6.6 conference on Networked services and applications: engineering, control and management
Dependable computing: concepts, limits, challenges
FTCS'95 Proceedings of the Twenty-Fifth international conference on Fault-tolerant computing
Fault coverage modeling in nonlinear dynamical systems
Automatica (Journal of IFAC)
Hi-index | 0.07 |
This paper develops techniques for generating and using mathematical models applicable to architectural evaluation of the tradeoffs involved in designing self-repairing highly reliable computers for long missions. These systems must use standby sparing and their reliability is shown to be extremely sensitive to small variations in a new design parameter, the coverage, c, defined as the probability of system recovery given the existence of a failure. Interactive terminal calculations show c to be the single most important parameter in high-reliability system design. Changing the coverage from 1 to .98 can result in orders of magnitude change in system mission time with a specified reliability. Most techniques for increasing system reliability (e.g. adding more spares) are shown to be futile in the face of an inadequate .99 coverage. Adding checking, diagnostics, etc. to improve failure coverage is shown to be the most advantageous technique by examples of system tradeoff evaluation. This mandates extensive application of modeling techniques throughout all computer system design phases.