Dual use of superscalar datapath for transient-fault detection and recovery
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
FOCUS: An Experimental Environment for Fault Sensitivity Analysis
IEEE Transactions on Computers
Time Redundancy Based Soft-Error Tolerance to Rescue Nanometer Technologies
VTS '99 Proceedings of the 1999 17TH IEEE VLSI Test Symposium
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Logic soft errors in sub-65nm technologies design and CAD challenges
Proceedings of the 42nd annual Design Automation Conference
Defeating Memory Corruption Attacks via Pointer Taintedness Detection
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Microprocessor Sensitivity to Failures: Control vs Execution and Combinational vs Sequential Logic
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Design at high level of a robust 8-bit microprocessor to soft errors by using only standard gates
SBCCI '06 Proceedings of the 19th annual symposium on Integrated circuits and systems design
Design of a soft-error robust microprocessor
Microelectronics Journal
Checksum-based probabilistic transient-error compensation for linear digital systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Post-silicon validation opportunities, challenges and recent advances
Proceedings of the 47th Design Automation Conference
Nanoscale technologies: prospect or hazard to dependable and secure computing?
LADC'07 Proceedings of the Third Latin-American conference on Dependable Computing
A Fault Tolerant Approach for FPGA Embedded Processors Based on Runtime Partial Reconfiguration
Journal of Electronic Testing: Theory and Applications
Hi-index | 0.00 |
The issue of transient (or soft) errors is one of the major concerns in designing and implementing the current generation of highly integrated digital systems. The continuous pushing of the processor performance envelope and the deployment of computer systems in complex mission- and life-critical applications has further increased the significance and impact of transient errors. In hardware, these errors have been handled at the device, circuit and architectural-level employing information redundancy, space redundancy, time redundancy or a combination of them. This paper analyzes techniques developed at the circuit- and the architectural-level, both in experimental academic research and industry. Based on past studies an observation is made that most low-level errors do not translate to errors in the outcome of the application, which is the primary concern of the user. Therefore, an alternative paradigm called application-aware runtime checking is proposed. In this approach the application is analyzed either statically or through dynamic profiling to extract its reliability-sensitive characteristics. Based on extracted application properties, hardware checkers/modules are devised and embedded in a processor-level framework to enable runtime error detection and recovery. The architecture of the Illinois Reliability and Security Engine is presented as a possible implementation of such a framework.