Fault Injection Techniques and Tools

Authors:
Mei-Chen Hsueh;Timothy K. Tsai;Ravishankar K. Iyer
Affiliations:
-;-;-
Venue:
Computer
Year:
1997

Citing 5
Cited 97

Experimental analysis of computer system dependability

Fault-tolerant computer system design
Fault Injection

Computer
An approach towards benchmarking of fault-tolerant commercial systems

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
DOCTOR: an integrated software fault injection environment for distributed real-time systems

IPDS '95 Proceedings of the International Computer Performance and Dependability Symposium on Computer Performance and Dependability Symposium
Challenges in fault detection

FTCS'95 Proceedings of the Twenty-Fifth international conference on Fault-tolerant computing

EXFI: a low-cost fault injection system for embedded microprocessor-based boards

ACM Transactions on Design Automation of Electronic Systems (TODAES)
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems

IEEE Transactions on Parallel and Distributed Systems
Experimental Evaluation of Behavior-Based Failure-Detection Schemes in Real-Time Communication Networks

IEEE Transactions on Parallel and Distributed Systems
Fundamentals of fault-tolerant distributed computing in asynchronous environments

ACM Computing Surveys (CSUR)
Coverage Estimation Methods for Stratified Fault-Injection

IEEE Transactions on Computers
Teraflops Supercomputer: Architecture and Validation of the Fault Tolerance Mechanisms

IEEE Transactions on Computers
Anomaly Detection in Embedded Systems

IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Dependability of COTS Microkernel-Based Systems

IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Increasing relevance of memory hardware errors: a case for recoverable programming models

EW 9 Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system
A prototype of a VHDL-based fault injection tool: description and application

Journal of Systems Architecture: the EUROMICRO Journal - Defect and fault tolerance in VLSI Systems
Design Validation of Embedded Dependable Systems

IEEE Micro
Certifying Software for High-Assurance Environments

IEEE Software
Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers

IEEE Transactions on Software Engineering
Can Software Implemented Fault-Injection Be Used on Real-Time Systems?

EDCC-3 Proceedings of the Third European Dependable Computing Conference on Dependable Computing
Assessing Error Detection Coverage by Simulated Fault Injection

EDCC-3 Proceedings of the Third European Dependable Computing Conference on Dependable Computing
FlexFi: A Flexible Fault Injection Environment for Microprocessor-Based Systems

SAFECOMP '99 Proceedings of the 18th International Conference on Computer Computer Safety, Reliability and Security
A Comparison Study of the Behavior of Equivalent Algorithms in Fault Injection Experiments in Parallel Superscalar Architectures

SAFECOMP '01 Proceedings of the 20th International Conference on Computer Safety, Reliability and Security
Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM

VTS '99 Proceedings of the 1999 17TH IEEE VLSI Test Symposium
A Study of the Effects of Transient Fault Injection into the VHDL Model of a Fault-Tolerant Microcomputer System

IOLTW '00 Proceedings of the 6th IEEE International On-Line Testing Workshop (IOLTW)
Comparison of Physical and Software-Implemented Fault Injection Techniques

IEEE Transactions on Computers
Improving the reliability of commodity operating systems

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Experiences during the Experimental Validation of the Time-Triggered Architecture

Proceedings of the conference on Design, automation and test in Europe - Volume 3
Efficient analysis of single event transients

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Desing and test of systems on a chip
A New Approach to the Analysis of Single Event Transients in VLSI Circuits

Journal of Electronic Testing: Theory and Applications
Susceptibility of Commodity Systems and Software to Memory Soft Errors

IEEE Transactions on Computers
Improving the reliability of commodity operating systems

ACM Transactions on Computer Systems (TOCS)
Assessing Fault Sensitivity in MPI Applications

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Maintenance-Oriented Fault Model for the DECOS Integrated Diagnostic Architecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 2 - Volume 03
An agent model for fault-tolerant systems

Proceedings of the 2005 ACM symposium on Applied computing
Software-Based Fault Tolerant Computing

Ubiquity
Putting Detectors in Their Place

SEFM '05 Proceedings of the Third IEEE International Conference on Software Engineering and Formal Methods
Autonomous recovery in componentized Internet applications

Cluster Computing
Reliability challenges in large systems

Future Generation Computer Systems
DimaX: a fault-tolerant multi-agent platform

Proceedings of the 2006 international workshop on Software engineering for large-scale multi-agent systems
Software based fault tolerance: a survey

Ubiquity
Virtual framework for testing the reliability of system software on embedded systems

Proceedings of the 2007 ACM symposium on Applied computing
Verification-guided soft error resilience

Proceedings of the conference on Design, automation and test in Europe
Component airbag: a novel approach to develop dependable component-based applications

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Component airbag: a novel approach to develop dependable component-based applications

The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: companion papers
A dependability benchmark for OLTP application environments

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Understanding the propagation of hard errors to software and implications for resilient system design

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Information Assurance: Dependability and Security in Networked Systems

Information Assurance: Dependability and Security in Networked Systems
ReSP: a non-intrusive transaction-level reflective MPSoC simulation platform for design space exploration

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Extending RUP to develop fault tolerant software

Proceedings of the 2008 ACM symposium on Applied computing
Four enhancements to automateddistributed system experimentation methods

Proceedings of the 30th international conference on Software engineering
Case-based software reliability assessmentby fault injection unified procedures

Proceedings of the 2008 international workshop on Software Engineering in east and south europe
Datapath error detection with no detection latency for high-performance microprocessors

WSEAS Transactions on Computers
Fault injection framework for system resilience evaluation: fake faults for finding future failures

Proceedings of the 2009 workshop on Resiliency in high performance
Fault emulation for dependability evaluation of VLSI systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Multi-perspective evaluation of self-healing systems using simple probabilistic models

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
A Realistic Simulation Testbed for Studying Game Playing in Robotic Soccer

Proceedings of the 2005 conference on Self-Organization and Autonomic Informatics (I)
Specifying the worst case: orthogonal modeling of hardware errors

Proceedings of the eighteenth international symposium on Software testing and analysis
Design of parallel fault-secure encoders for systematic cyclic block transmission codes

Microelectronics Journal
An Experimental Framework for the Analysis and Validation of Software Clocks

SEUS '09 Proceedings of the 7th IFIP WG 10.2 International Workshop on Software Technologies for Embedded and Ubiquitous Systems
Reliable data path design of VLIW processor cores with comprehensive error-coverage assessment

Microprocessors & Microsystems
QUACK: A Platform for the Quality of New Generation Integrated Embedded Systems

Electronic Notes in Theoretical Computer Science (ENTCS)
Reliability challenges in large systems

Future Generation Computer Systems
ReSP: a nonintrusive transaction-level reflective MPSoC simulation platform for design space exploration

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Error patterns: systematic investigation of deviations in task models

TAMODIA'06 Proceedings of the 5th international conference on Task models and diagrams for users interface design
Exhaustive testing of exception handlers with enforcer

FMCO'06 Proceedings of the 5th international conference on Formal methods for components and objects
References

Dependability metrics
An effective method to control interrupt handler for data race detection

Proceedings of the 5th Workshop on Automation of Software Test
Rump file systems: kernel code reborn

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Boosting software fault injection for dependability analysis of real-time embedded applications

ACM Transactions on Embedded Computing Systems (TECS)
Fault injection approach based on dependence analysis

COMPSAC-W'05 Proceedings of the 29th annual international conference on Computer software and applications conference
How to advance TPC benchmarks with dependability aspects

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
HIFsuite: tools for HDL code conversion and manipulation

EURASIP Journal on Embedded Systems
Simulation-based analysis of middleware service impact on system reliability: Experiment on Java application server

Journal of Systems and Software
Fault injection-based assessment of partial fault tolerance in stream processing applications

Proceedings of the 5th ACM international conference on Distributed event-based system
A Java Framework to Specify Faultloads for Fault Injection Campaigns

Journal of Electronic Testing: Theory and Applications
Gulliver: a test-bed for developing, demonstrating and prototyping vehicular systems

Proceedings of the 9th ACM international symposium on Mobility management and wireless access
Experiment based validation of CIIP

CRITIS'06 Proceedings of the First international conference on Critical Information Infrastructures Security
Evaluation of network dependability using event injection

APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications
Enabling the selection of COTS components

ICCBSS'05 Proceedings of the 4th international conference on COTS-Based Software Systems
Injecting communication faults to experimentally validate java distributed applications

ISSADS'05 Proceedings of the 5th international conference on Advanced Distributed Systems
Novel generic middleware building blocks for dependable modular avionics systems

EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Fast run-time reconfiguration for SEU injection

EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Using stratified sampling for fault injection

LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Enforcer – efficient failure injection

FM'06 Proceedings of the 14th international conference on Formal Methods
Fault injection approach based on architectural dependencies

Architecting Dependable Systems III
Hardware dependability in the presence of soft errors

VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
A case for virtual machine based fault injection in a high-performance computing environment

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Characterizing logging practices in open-source software

Proceedings of the 34th International Conference on Software Engineering
Enforcing Murphy's law for advance identification of run-time failures

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Thread vulnerability in parallel applications

Journal of Parallel and Distributed Computing
A federated simulation framework with ATN fault injection module for reliablity analysis of UAVs in non-controlled airspace

SAFECOMP'12 Proceedings of the 2012 international conference on Computer Safety, Reliability, and Security
Formal Validation of a Deterministic MAC Protocol

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Modeling and Verification of Discrete Event Systems
Model-Driven v&v processes for computer based control systems: a unifying perspective

ISoLA'12 Proceedings of the 5th international conference on Leveraging Applications of Formal Methods, Verification and Validation: applications and case studies - Volume Part II
A secure architecture for smart meter systems

CSS'12 Proceedings of the 4th international conference on Cyberspace Safety and Security
CONFU: Configuration Fuzzing Testing Framework for Software Vulnerability Detection

International Journal of Secure Software Engineering
Evaluation studies of software testing research in Brazil and in the world: A survey of two premier software engineering conferences

Journal of Systems and Software
Supporting swift reaction: automatically uncovering performance problems by systematic experiments

Proceedings of the 2013 International Conference on Software Engineering
Analysis and characterization of inherent application resilience for approximate computing

Proceedings of the 50th Annual Design Automation Conference
Reli: hardware/software checkpoint and recovery scheme for embedded processors

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
A survey on resiliency assessment techniques for wireless sensor networks

Proceedings of the 11th ACM international symposium on Mobility management and wireless access
Security Testing Methodology for Vulnerabilities Detection of XSS in Web Services and WS-Security

Electronic Notes in Theoretical Computer Science (ENTCS)
Component survivability at runtime for mission-critical distributed systems

The Journal of Supercomputing

Quantified Score

Hi-index	4.11

Visualization

Abstract

Dependability evaluation involves the study of failures and errors. The destructive nature of a crash and long error latency make it difficult to identify the causes of failures in the operational environment. It is particularly hard to recreate a failure scenario for a large, complex system. To identify and understand potential failures, the authors use an experiment-based approach for studying system dependability. This approach is applied during the conception, design, prototype, and operational phases. To take an experiment-based approach, you must first understand a system's architecture, structure, and behavior. You need to know its tolerance for faults and failures, including its built-in detection and recovery mechanisms,and you need specific instruments and tools to inject faults, create failures or errors, and monitor their effects. Engineers most often use low-cost, simulation-based fault injection to evaluate the dependability of a system that is in the conceptual and design phases. At this point, the system under study is only a series of high-level abstractions; implementation details have yet to be determined. Thus the system is simulated on the basis of simplified assumptions. Simulation-based fault injection, which assumes that errors or failures occur according to predetermined distribution, is useful for evaluating the effectiveness of fault-tolerant mechanisms and a system's dependability; it does provide timely feedback to system engineers. However, it requires accurate input parameters, which are difficult to supply: Design and technology changes often complicate the use of past measurements. Testing a prototype, on the other hand, allows you to evaluate the system without any assumptions about system design. Instead of injecting faults, engineers can directly measure operational systems as they handle real workloads.Measurement-based analysis uses actual data, which contains much information about naturally occurring errors and failures and sometimes about recovery attempts. Although these three experimental methods have limitations, their unique values complement one another and allow for a wide spectrum of dependability studies.