Testing for Intermittent Faults in Digital Circuits
IEEE Transactions on Computers
An Approach to the Diagnosis of Intermittent Faults
IEEE Transactions on Computers
Diagnosable Systems for Intermittent Faults
IEEE Transactions on Computers
A Continuous-Parameter Markov Model and Detection Procedures for Intermittent Faults
IEEE Transactions on Computers
Implementation of an Experimental Fault-Tolerant Memory System
IEEE Transactions on Computers
Intermittent Faults: A Model and a Detection Procedure
IEEE Transactions on Computers
Diagnosis of Short-Circuit Faults in Combinational Circuits
IEEE Transactions on Computers
Diagnosis of Intermittent Faults in Combinational Networks
IEEE Transactions on Computers
A distinguishability criterion for selecting efficient diagnostic tests
AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
Effects and detection of intermittent failures in digital systems
AFIPS '69 (Fall) Proceedings of the November 18-20, 1969, fall joint computer conference
A study of intermittent faults in digital computers
AFIPS '77 Proceedings of the June 13-16, 1977, national computer conference
A self-checking generalized prediction checker and its use for built-in testing
IEEE Transactions on Computers
A comparative analysis of event tupling schemes
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
A Self-Testing Group-Parity Prediction Checker and Its Use for Built-In Testing
IEEE Transactions on Computers
Fault isolation in grey systems
ITC'88 Proceedings of the 1988 international conference on Test: new frontiers in testing
Designs for dlagnosablllty and reliability in VLSI systems
ITC'88 Proceedings of the 1988 international conference on Test: new frontiers in testing
Hi-index | 0.01 |
As computer technologies advance to achieve higher performance and density, intermittent failures become more dominant than solid failures, with the result that the effectiveness of any diagnostic procedure which relies on reproducing failures is greatly reduced. This problem is solved at the system level by a new strategy of dynamic error detection and fault isolation based on error checking and analysis of captured information. The model developed in this paper allows the system designer to project the dynamic error-detection and fault-isolation coverages of the system as a function of the failure rates of components and the types and placement of error checkers, which has resulted in significant improvements to both detection and isolation in the IBM 3081 Processor Unit. The model has also resulted in new probabilistic isolation strategies based on the likelihood of failures. Our experiences with this model on several IBM products, including the 3081, show good correlation between the model and practical experiments.