Fast hashing of variable-length text strings
Communications of the ACM
The SPARC architecture manual (version 9)
The SPARC architecture manual (version 9)
LCLint: a tool for using specifications to check code
SIGSOFT '94 Proceedings of the 2nd ACM SIGSOFT symposium on Foundations of software engineering
Dynamically Discovering Likely Program Invariants to Support Program Evolution
IEEE Transactions on Software Engineering - Special issue on 1999 international conference on software engineering
Bugs as deviant behavior: a general approach to inferring errors in systems code
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
ESP: path-sensitive program verification in polynomial time
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Tracking down software bugs using automatic anomaly detection
Proceedings of the 24th International Conference on Software Engineering
Executable Assertions for Detecting Data Errors in Embedded Control Systems
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Synthesis And Optimization Of DSP Algorithms
Synthesis And Optimization Of DSP Algorithms
Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Multi-media Applications and Imprecise Computation
DSD '05 Proceedings of the 8th Euromicro Conference on Digital System Design
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Software-controlled fault tolerance
ACM Transactions on Architecture and Code Optimization (TACO)
ReStore: Symptom-Based Soft Error Detection in Microprocessors
IEEE Transactions on Dependable and Secure Computing
Dynamic Derivation of Application-Specific Error Detectors and their Implementation in Hardware
EDCC '06 Proceedings of the Sixth European Dependable Computing Conference
Enhancing server availability and security through failure-oblivious computing
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automated Derivation of Application-aware Error Detectors using Static Analysis
IOLTS '07 Proceedings of the 13th IEEE International On-Line Testing Symposium
Application-Level Correctness and its Impact on Fault Tolerance
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Efficient fault tolerance in multi-media applications through selective instruction replication
Proceedings of the 2008 workshop on Radiation effects and fault tolerance in nanometer technologies
Fool me twice: Exploring and exploiting error tolerance in physics-based animation
ACM Transactions on Graphics (TOG)
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Shoestring: probabilistic soft error reliability on the cheap
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Leveraging variable function resilience for selective software reliability on unreliable hardware
Proceedings of the Conference on Design, Automation and Test in Europe
Exploiting program-level masking and error propagation for constrained reliability optimization
Proceedings of the 50th Annual Design Automation Conference
Analysis and characterization of inherent application resilience for approximate computing
Proceedings of the 50th Annual Design Automation Conference
Quality programmable vector processors for approximate computing
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Journal of Electronic Testing: Theory and Applications
Hi-index | 0.00 |
Traditionally, research in fault tolerance has required architectural state to be numerically perfect for program execution to be correct. However, in many programs, even if execution is not 100% numerically correct, the program can still appear to execute correctly from the user's perspective. To quantify user satisfaction, application-level fidelity metrics (such as PSNR) can be used. The output for such applications is defined to be correct if the fidelity metrics satisfy a certain threshold. However, such applications still contain instructions whose outputs are critical -- i. e. their correctness decides if the overall quality of the program output is acceptable. In this paper, we present an analysis technique for identifying such critical program segments. More importantly, our technique is capable of guaranteeing application-level correctness through a combination of static analysis and runtime monitoring. Our static analysis consists of data flow analysis followed by control flow analysis to find static critical instructions which affect several instructions. Critical instructions are further refined into likely non-critical and likely critical sets in a profiling phase. At runtime, we use a monitoring scheme to monitor likely non-critical instructions and take remedial actions if some likely non-critical instructions become critical. Based on this analysis, we minimize the number of instructions that are duplicated and checked at runtime using a software-based fault detection and recovery technique [20]. Put together, our approach can lead to 22% average energy savings for multimedia applications while guaranteeing application-level correctness, when compared to a recent work [9], which cannot guarantee application-level correctness. Comparing to the approach proposed in [20] which guarantees both application-level and numerical correctness, our method achieves 79% energy reduction.