Watchdog processors in parallel systems
Microprocessing and Microprogramming - Special quadruple issue: short notes from EUROMICRO 93
A comparison of list schedules for parallel processing systems
Communications of the ACM
Real-Time Systems: Design Principles for Distributed Embedded Applications
Real-Time Systems: Design Principles for Distributed Embedded Applications
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing
IEEE Transactions on Parallel and Distributed Systems
Benchmarking the Task Graph Scheduling Algorithms
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Soft-Error Detection Using Control Flow Assertions
DFT '03 Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems
Error Detection Enhancement in COTS Superscalar Processors with Event Monitoring Features
PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor
Proceedings of the 31st annual international symposium on Computer architecture
Design Optimization of Time-and Cost-Constrained Fault-Tolerant Distributed Embedded Systems
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A Software-Based Concurrent Error Detection Technique for PowerPC Processor-based Embedded Systems
DFT '05 Proceedings of the 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems
A Hardware Approach to Concurrent Error Detection Capability Enhancement in COTS Processors
PRDC '05 Proceedings of the 11th Pacific Rim International Symposium on Dependable Computing
Proceedings of the conference on Design, automation and test in Europe: Proceedings
A Software-Based Error Detection Technique Using Encoded Signatures
DFT '06 Proceedings of the 21st IEEE International Symposium on on Defect and Fault-Tolerance in VLSI Systems
The use of triple-modular redundancy to improve computer reliability
IBM Journal of Research and Development
Analysis and optimization of fault-tolerant embedded systems with hardened processors
Proceedings of the Conference on Design, Automation and Test in Europe
Transparent recovery from intermittent faults in time-triggered distributed systems
IEEE Transactions on Computers
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
Soft errors and errors caused by intermittent faults are a major concern for modern processors. In this paper we provide a drastically different approach for fault tolerant scheduling (FTS) of tasks in such processors. Traditionally in FTS, error detection is performed implicitly and concurrently with task execution, and associated overheads are incurred as increases in software run-time or hardware area. However, such embedded error detection (EED) techniques, e.g., watchdog processor assisted control flow checking, only provide approximately 70% error coverage [1, 2]. We propose the idea of utilizing straightforward explicit output comparison (EOC) which provides nearly 100% error coverage. We construct a framework for utilizing EOC in FTS, identify new challenges and tradeoffs, and develop a new off-line scheduling algorithm for EOC. We show that our EOC based approach provides higher error coverage and an average performance improvement of nearly 10% over EED-based FTS approaches, without increasing resource requirements. In our ongoing research we are identifying a richer set of ways of applying EOC, by itself and in conjunction with EED, to obtain further improvements.