Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
IBM's S/390 G5 Microprocessor Design
IEEE Micro
A 1.3GHz fifth generation SPARC64 microprocessor
Proceedings of the 40th annual Design Automation Conference
Experimental evaluation of the fail-silent behaviour in programs with consistency checks
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection
Proceedings of the International Symposium on Code Generation and Optimization
Algorithm-Based Fault Tolerance for Matrix Operations
IEEE Transactions on Computers
An Approach to Concurrent Control Flow Checking
IEEE Transactions on Software Engineering
PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures
IEEE Transactions on Dependable and Secure Computing
Advances in Cognitive Information Systems
Advances in Cognitive Information Systems
Transient Fault Tolerance for ccNUMA Architecture
IMIS '12 Proceedings of the 2012 Sixth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing
Hi-index | 0.01 |
Transient fault is a critical concern in the reliability of microprocessor system. The software fault tolerance is more flexible and lower in cost than the hardware fault tolerance. And also, as architectural trends point toward multicore designs, there is substantial interest in adapting parallel and redundancy hardware resources for transient fault tolerance. The paper proposes a process-level fault tolerance technique, a software-centric approach, which efficiently schedules and synchronizes redundancy processes with ccNUMA processors redundancy. So it can improve efficiency of redundancy processes running and reduce time and space overhead. The paper focuses on the researching of redundancy processes error detection and handling method. A real prototype is implemented that is designed to be transparent to the application. The test results show that the system can timely detect soft errors of CPU and memory that cause the redundancy processes exception, and meanwhile ensure that the services of the application are uninterrupted and delayed shortly.