Spare Capacity as a Means of Fault Detection and Diagnosis in Multiprocessor Systems
IEEE Transactions on Computers
IEEE Transactions on Computers
Introspection: a low overhead binding technique during self-diagnosing microarchitecture synthesis
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Automatic Synthesis of Self-Recovering VLSI Systems
IEEE Transactions on Computers
Partitioned Encoding Schemes for Algorithm-Based Fault Tolerance in Massively Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Microarchitectural synthesis of gracefully degradable, dynamically reconfigurable ASICs
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Configurable Spare Processors: A New Approach to System Level-Fault Tolerance
DFT '96 Proceedings of the 1996 Workshop on Defect and Fault-Tolerance in VLSI Systems
Compiler-assisted generation of error-detecting parallel programs
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Microarchitectural Synthesis Of ICs With Embedded Concurrent Fault Isolation
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
Behavioral-level synthesis of heterogeneous BISR reconfigurable ASIC's
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Optimal algorithms for recovery point insertion in recoverable microarchitectures
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Design of concurrent test hardware for linear analog circuits with constrained hardware overhead
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An error recoverable structure based on complementary logic and alternating-retry
Journal of Computer Science and Technology
Hi-index | 0.00 |
The authors present a framework for tailoring fault tolerant approaches for both permanent and transient faults to the specific needs of an application. In particular, they address methodologies for encoding fault isolation properties in calculation duplication to allow permanent fault identification, an efficient approach to post-identification reconfiguration using graceful degradation instead of spares, and an error recovery technique which actually recovers from previously detected errors in parallel with future calculations, thus achieving zero-error latency. In conjunction, these techniques provide an efficient alternative to traditional triplication and rollback schemes, and allow significant tailoring of area/resiliency trade-offs for individual designs.