Roving Emulation as a Fault Detection Mechanism
IEEE Transactions on Computers
The Comparison Approach to Multiprocessor Fault Diagnosis
IEEE Transactions on Computers
A comparison connection assignment for diagnosis of multiprocessor systems
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Theory, Volume 1, Queueing Systems
Theory, Volume 1, Queueing Systems
The consensus problem in fault-tolerant computing
ACM Computing Surveys (CSUR)
Introspection: a low overhead binding technique during self-diagnosing microarchitecture synthesis
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Loop Transformations for Fault Detection in Regular Loops on Massively Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Concurrent error recovery with near-zero latency in synthesized ASICs
Proceedings of the conference on Design, automation and test in Europe
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient Self-Recovering ASIC Design
IEEE Design & Test
IEEE Transactions on Computers
Automatic Synthesis of Self-Recovering VLSI Systems
IEEE Transactions on Computers
On-line testing of scalable signal processing architectures using a software test method
ITC '98 Proceedings of the 1998 IEEE International Test Conference
A study of time redundant fault tolerance techniques for superscalar processors
DFT '95 Proceedings of the IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
Utilizing spares in multichip modules for the dual function of fault coverage and fault diagnosis
DFT '95 Proceedings of the IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
IEEE Transactions on Parallel and Distributed Systems
Towards Nanoelectronics Processor Architectures
Journal of Electronic Testing: Theory and Applications
A new availability concept for (n,k)-way cluster systems regarding waiting time
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
Hi-index | 14.99 |
A technique for detecting and diagnosing faults at the processor level in a multiprocessor system is described. A process is assigned whenever possible to two processors: the processor to which it would normally be assigned (primarily) and an additional processor that would otherwise be idle (secondary). Two strategies are described and analyzed: one that is preemptive and another that is nonpreemptive. It is shown that, for moderately loaded systems, a sufficient percentage of processes can be performed redundantly using the system's spare capacity to provide a basis for fault detection and diagnosis with virtually no degradation of response time. A multiprocessor that uses the approach for detecting faults at the processor loads is described.