Experiments with a Program Timing Tool Based on Source-Level Timing Schema
Computer - Special issue on real-time systems
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Data cache locking for higher program predictability
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
RTCSA '99 Proceedings of the Sixth International Conference on Real-Time Computing Systems and Applications
WCET Analysis of Probabilistic Hard Real-Time Systems
RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
A Modular & Retargetable Framework for Tree-Based WCET Analysis
ECRTS '01 Proceedings of the 13th Euromicro Conference on Real-Time Systems
Y-Branches: When You Come to a Fork in the Road, Take It
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Microarchitecture and Design Challenges for Gigascale Integration
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
NonStop® Advanced Architecture
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
ReStore: Symptom Based Soft Error Detection in Microprocessors
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Satisfying real-time constraints with custom instructions
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Processor-Level Selective Replication
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Application-Level Correctness and its Impact on Fault Tolerance
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Chronos: A timing analyzer for embedded software
Science of Computer Programming
Data cache locking for tight timing calculations
ACM Transactions on Embedded Computing Systems (TECS)
Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Argus: Low-Cost, Comprehensive Error Detection in Simple Cores
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
The worst-case execution-time problem—overview of methods and survey of tools
ACM Transactions on Embedded Computing Systems (TECS)
ESoftCheck: Removal of Non-vital Checks for Fault Tolerance
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
IBM S/390 parallel enterprise server G5 fault tolerance: a historical perspective
IBM Journal of Research and Development
Shoestring: probabilistic soft error reliability on the cheap
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
DAFT: decoupled acyclic fault tolerance
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Optimal WCET-aware code selection for scratchpad memory
EMSOFT '10 Proceedings of the tenth ACM international conference on Embedded software
The Reliability Wall for Exascale Supercomputing
IEEE Transactions on Computers
WCET-aware data selection and allocation for scratchpad memory
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Runtime asynchronous fault tolerance via speculation
Proceedings of the Tenth International Symposium on Code Generation and Optimization
BLOCKWATCH: Leveraging similarity in parallel programs for error detection
DSN '12 Proceedings of the 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Low-cost program-level detectors for reducing silent data corruptions
DSN '12 Proceedings of the 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Hi-index | 0.00 |
Transient faults will soon become a critical reliability concern for processors used in mainstream computing. As the mainstream commodity market accepts only low-cost solutions for transient-fault tolerance, traditional high-end solutions are not acceptable due to their prohibitive costs. This paper presents Epipe, a hybrid software/hardware solution that provides sufficient fault coverage with affordable overhead for mainstream commodity systems. Given a program, Epipe identifies its vulnerable instructions (VIs), i.e., the ones that may cause silent data corruptions (SDCs) by compile-time analysis, and selects a subset of VIs to protect considering worst-case execution time (WCET) constraints in the fault-free execution. During program execution on a modified superscalar processor which incurs minimal hardware overhead, Epipe relies on selective instruction replication to handle the VI-induced SDCs and an existing exception detector to tolerate the remaining faults that manifest as system exceptions. Our experimental results show that Epipe provides sufficient fault coverage under some tight WCET constraints and increasingly higher coverage under more relaxed WCET constraints. As the WCET allowance increases from 5% to 15% and then to 25%, the coverage increases from 70.8% to 80% and then to 86.6% averagely. Unlike existing hybrid solutions, Epipe is the first to respect WCET constraints, which are an important concern for real-time systems.