Spare Capacity as a Means of Fault Detection and Diagnosis in Multiprocessor Systems

Authors:
A. T. Dahbura;K. K. Sabnani;W. J. Hery
Affiliations:
-;-;-
Venue:
IEEE Transactions on Computers
Year:
1989

Citing 4
Cited 15

Roving Emulation as a Fault Detection Mechanism

IEEE Transactions on Computers
The Comparison Approach to Multiprocessor Fault Diagnosis

IEEE Transactions on Computers
A comparison connection assignment for diagnosis of multiprocessor systems

ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Theory, Volume 1, Queueing Systems

Theory, Volume 1, Queueing Systems

The consensus problem in fault-tolerant computing

ACM Computing Surveys (CSUR)
Introspection: a low overhead binding technique during self-diagnosing microarchitecture synthesis

DAC '96 Proceedings of the 33rd annual Design Automation Conference
Loop Transformations for Fault Detection in Regular Loops on Massively Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Concurrent error recovery with near-zero latency in synthesized ASICs

Proceedings of the conference on Design, automation and test in Europe
Introspection: A register transfer level technique for cocurrent error detection and diagnosis in data dominated designs

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient Self-Recovering ASIC Design

IEEE Design & Test
Low Overhead Multiprocessor Allocation Strategies Exploiting System Spare Capacity for Fault Detection and Location

IEEE Transactions on Computers
Automatic Synthesis of Self-Recovering VLSI Systems

IEEE Transactions on Computers
On-line testing of scalable signal processing architectures using a software test method

ITC '98 Proceedings of the 1998 IEEE International Test Conference
A study of time redundant fault tolerance techniques for superscalar processors

DFT '95 Proceedings of the IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
Utilizing spares in multichip modules for the dual function of fault coverage and fault diagnosis

DFT '95 Proceedings of the IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
Scheduling Algorithms Exploiting Spare Capacity and Tasks' Laxities for Fault Detection and Location in Real-time Multiprocessor Systems

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Time-Constrained Failure Diagnosis in Distributed Embedded Systems: Application to Actuator Diagnosis

IEEE Transactions on Parallel and Distributed Systems
Towards Nanoelectronics Processor Architectures

Journal of Electronic Testing: Theory and Applications
A new availability concept for (n,k)-way cluster systems regarding waiting time

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI

Quantified Score

Hi-index	14.99

Visualization

Abstract

A technique for detecting and diagnosing faults at the processor level in a multiprocessor system is described. A process is assigned whenever possible to two processors: the processor to which it would normally be assigned (primarily) and an additional processor that would otherwise be idle (secondary). Two strategies are described and analyzed: one that is preemptive and another that is nonpreemptive. It is shown that, for moderately loaded systems, a sufficient percentage of processes can be performed redundantly using the system's spare capacity to provide a basis for fault detection and diagnosis with virtually no degradation of response time. A multiprocessor that uses the approach for detecting faults at the processor loads is described.