Analysis of a Class of Recovery Procedures
IEEE Transactions on Computers
A watchdog processor based general rollback technique with multiple retries
IEEE Transactions on Software Engineering
Error-control coding for computer systems
Error-control coding for computer systems
Design & analysis of fault tolerant digital systems
Design & analysis of fault tolerant digital systems
High-Performance Fault-Tolerant VLSI Systems Using Micro Rollback
IEEE Transactions on Computers
Testing semiconductor memories: theory and practice
Testing semiconductor memories: theory and practice
Reliable computer systems (2nd ed.): design and evaluation
Reliable computer systems (2nd ed.): design and evaluation
Hardware and software fault tolerance in parallel computing systems
Hardware and software fault tolerance in parallel computing systems
Concurrent Error Detection Using Watchdog Processors-A Survey
IEEE Transactions on Computers
Fehlertolerierende Rechensysteme / Fault-Tolerant Computing Systems, 3. Internationale GI/ITG/GMA-Fachtagung
Certification Trails and Software Design for Testability
Proceedings of the IEEE International Test Conference on Designing, Testing, and Diagnostics - Join Them
Area efficient architectures for information integrity in cache memories
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Fault-Containment in Cache Memories for TMR Redundant Processor Systems
IEEE Transactions on Computers
Threshold-Based Mechanisms to Discriminate Transient from Intermittent Faults
IEEE Transactions on Computers
Analysis of a BICS-Only Concurrent Error Detection Method
IEEE Transactions on Computers
Experimental Evaluation of Fault Handling Mechanisms
SAFECOMP '01 Proceedings of the 20th International Conference on Computer Safety, Reliability and Security
Enhancing data cache reliability by the addition of a small fully-associative replication cache
Proceedings of the 18th annual international conference on Supercomputing
Using Transient/Persistent Errors to Develop Automated Test Oracles for Event-Driven Software
Proceedings of the 19th IEEE international conference on Automated software engineering
IEEE Transactions on Parallel and Distributed Systems
SEU tolerant device, circuit and processor design
Proceedings of the 42nd annual Design Automation Conference
Replication Cache: A Small Fully Associative Cache to Improve Data Cache Reliability
IEEE Transactions on Computers
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Replica victim caching to improve cache reliability against transient errors
International Journal of High Performance Systems Architecture
Review: A survey of memory error correcting techniques for improved reliability
Journal of Network and Computer Applications
Analysis and optimization of fault-tolerant task scheduling on multiprocessor embedded systems
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Using silent writes in low-power traffic-aware ECC
PATMOS'11 Proceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation
Checkpointing for the reliability of real-time systems with on-line fault detection
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Hi-index | 0.01 |
It is hard to shield systems effectively from transient faults (fault avoidance techniques). So some other means must be employed to assure appropriate levels of transient fault tolerance (insensitivity to transient faults). They are based on fault-masking and fault recovery ideas. Having analyzed this problem, the author identifies critical design points and outlines some practical solutions that refer to efficient on-line detectors (detecting errors during the system operation) and error handling procedures. This framework provides a basis for understanding transient fault problems in digital systems. It can be helpful in selecting optimum techniques to mask or eliminate transient fault effects in developed systems.