Experimental analysis of computer system dependability
Fault-tolerant computer system design
Computer
An approach towards benchmarking of fault-tolerant commercial systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
DOCTOR: an integrated software fault injection environment for distributed real-time systems
IPDS '95 Proceedings of the International Computer Performance and Dependability Symposium on Computer Performance and Dependability Symposium
FTCS'95 Proceedings of the Twenty-Fifth international conference on Fault-tolerant computing
EXFI: a low-cost fault injection system for embedded microprocessor-based boards
ACM Transactions on Design Automation of Electronic Systems (TODAES)
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Fundamentals of fault-tolerant distributed computing in asynchronous environments
ACM Computing Surveys (CSUR)
Coverage Estimation Methods for Stratified Fault-Injection
IEEE Transactions on Computers
Teraflops Supercomputer: Architecture and Validation of the Fault Tolerance Mechanisms
IEEE Transactions on Computers
Anomaly Detection in Embedded Systems
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Dependability of COTS Microkernel-Based Systems
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Increasing relevance of memory hardware errors: a case for recoverable programming models
EW 9 Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system
A prototype of a VHDL-based fault injection tool: description and application
Journal of Systems Architecture: the EUROMICRO Journal - Defect and fault tolerance in VLSI Systems
Certifying Software for High-Assurance Environments
IEEE Software
Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers
IEEE Transactions on Software Engineering
Can Software Implemented Fault-Injection Be Used on Real-Time Systems?
EDCC-3 Proceedings of the Third European Dependable Computing Conference on Dependable Computing
Assessing Error Detection Coverage by Simulated Fault Injection
EDCC-3 Proceedings of the Third European Dependable Computing Conference on Dependable Computing
FlexFi: A Flexible Fault Injection Environment for Microprocessor-Based Systems
SAFECOMP '99 Proceedings of the 18th International Conference on Computer Computer Safety, Reliability and Security
SAFECOMP '01 Proceedings of the 20th International Conference on Computer Safety, Reliability and Security
Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM
VTS '99 Proceedings of the 1999 17TH IEEE VLSI Test Symposium
IOLTW '00 Proceedings of the 6th IEEE International On-Line Testing Workshop (IOLTW)
Comparison of Physical and Software-Implemented Fault Injection Techniques
IEEE Transactions on Computers
Improving the reliability of commodity operating systems
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Experiences during the Experimental Validation of the Time-Triggered Architecture
Proceedings of the conference on Design, automation and test in Europe - Volume 3
Efficient analysis of single event transients
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Desing and test of systems on a chip
A New Approach to the Analysis of Single Event Transients in VLSI Circuits
Journal of Electronic Testing: Theory and Applications
Susceptibility of Commodity Systems and Software to Memory Soft Errors
IEEE Transactions on Computers
Improving the reliability of commodity operating systems
ACM Transactions on Computer Systems (TOCS)
Assessing Fault Sensitivity in MPI Applications
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Maintenance-Oriented Fault Model for the DECOS Integrated Diagnostic Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 2 - Volume 03
An agent model for fault-tolerant systems
Proceedings of the 2005 ACM symposium on Applied computing
Putting Detectors in Their Place
SEFM '05 Proceedings of the Third IEEE International Conference on Software Engineering and Formal Methods
Autonomous recovery in componentized Internet applications
Cluster Computing
Reliability challenges in large systems
Future Generation Computer Systems
DimaX: a fault-tolerant multi-agent platform
Proceedings of the 2006 international workshop on Software engineering for large-scale multi-agent systems
Virtual framework for testing the reliability of system software on embedded systems
Proceedings of the 2007 ACM symposium on Applied computing
Verification-guided soft error resilience
Proceedings of the conference on Design, automation and test in Europe
Component airbag: a novel approach to develop dependable component-based applications
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Component airbag: a novel approach to develop dependable component-based applications
The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: companion papers
A dependability benchmark for OLTP application environments
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Information Assurance: Dependability and Security in Networked Systems
Information Assurance: Dependability and Security in Networked Systems
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Extending RUP to develop fault tolerant software
Proceedings of the 2008 ACM symposium on Applied computing
Four enhancements to automateddistributed system experimentation methods
Proceedings of the 30th international conference on Software engineering
Case-based software reliability assessmentby fault injection unified procedures
Proceedings of the 2008 international workshop on Software Engineering in east and south europe
Datapath error detection with no detection latency for high-performance microprocessors
WSEAS Transactions on Computers
Fault injection framework for system resilience evaluation: fake faults for finding future failures
Proceedings of the 2009 workshop on Resiliency in high performance
Fault emulation for dependability evaluation of VLSI systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Multi-perspective evaluation of self-healing systems using simple probabilistic models
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
A Realistic Simulation Testbed for Studying Game Playing in Robotic Soccer
Proceedings of the 2005 conference on Self-Organization and Autonomic Informatics (I)
Specifying the worst case: orthogonal modeling of hardware errors
Proceedings of the eighteenth international symposium on Software testing and analysis
Design of parallel fault-secure encoders for systematic cyclic block transmission codes
Microelectronics Journal
An Experimental Framework for the Analysis and Validation of Software Clocks
SEUS '09 Proceedings of the 7th IFIP WG 10.2 International Workshop on Software Technologies for Embedded and Ubiquitous Systems
Reliable data path design of VLIW processor cores with comprehensive error-coverage assessment
Microprocessors & Microsystems
QUACK: A Platform for the Quality of New Generation Integrated Embedded Systems
Electronic Notes in Theoretical Computer Science (ENTCS)
Reliability challenges in large systems
Future Generation Computer Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Error patterns: systematic investigation of deviations in task models
TAMODIA'06 Proceedings of the 5th international conference on Task models and diagrams for users interface design
Exhaustive testing of exception handlers with enforcer
FMCO'06 Proceedings of the 5th international conference on Formal methods for components and objects
Dependability metrics
An effective method to control interrupt handler for data race detection
Proceedings of the 5th Workshop on Automation of Software Test
Rump file systems: kernel code reborn
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Boosting software fault injection for dependability analysis of real-time embedded applications
ACM Transactions on Embedded Computing Systems (TECS)
Fault injection approach based on dependence analysis
COMPSAC-W'05 Proceedings of the 29th annual international conference on Computer software and applications conference
How to advance TPC benchmarks with dependability aspects
TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
HIFsuite: tools for HDL code conversion and manipulation
EURASIP Journal on Embedded Systems
Journal of Systems and Software
Fault injection-based assessment of partial fault tolerance in stream processing applications
Proceedings of the 5th ACM international conference on Distributed event-based system
A Java Framework to Specify Faultloads for Fault Injection Campaigns
Journal of Electronic Testing: Theory and Applications
Gulliver: a test-bed for developing, demonstrating and prototyping vehicular systems
Proceedings of the 9th ACM international symposium on Mobility management and wireless access
Experiment based validation of CIIP
CRITIS'06 Proceedings of the First international conference on Critical Information Infrastructures Security
Evaluation of network dependability using event injection
APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications
Enabling the selection of COTS components
ICCBSS'05 Proceedings of the 4th international conference on COTS-Based Software Systems
Injecting communication faults to experimentally validate java distributed applications
ISSADS'05 Proceedings of the 5th international conference on Advanced Distributed Systems
Novel generic middleware building blocks for dependable modular avionics systems
EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Fast run-time reconfiguration for SEU injection
EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Using stratified sampling for fault injection
LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Enforcer – efficient failure injection
FM'06 Proceedings of the 14th international conference on Formal Methods
Fault injection approach based on architectural dependencies
Architecting Dependable Systems III
Hardware dependability in the presence of soft errors
VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
A case for virtual machine based fault injection in a high-performance computing environment
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Characterizing logging practices in open-source software
Proceedings of the 34th International Conference on Software Engineering
Enforcing Murphy's law for advance identification of run-time failures
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Thread vulnerability in parallel applications
Journal of Parallel and Distributed Computing
SAFECOMP'12 Proceedings of the 2012 international conference on Computer Safety, Reliability, and Security
Formal Validation of a Deterministic MAC Protocol
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Modeling and Verification of Discrete Event Systems
Model-Driven v&v processes for computer based control systems: a unifying perspective
ISoLA'12 Proceedings of the 5th international conference on Leveraging Applications of Formal Methods, Verification and Validation: applications and case studies - Volume Part II
A secure architecture for smart meter systems
CSS'12 Proceedings of the 4th international conference on Cyberspace Safety and Security
CONFU: Configuration Fuzzing Testing Framework for Software Vulnerability Detection
International Journal of Secure Software Engineering
Supporting swift reaction: automatically uncovering performance problems by systematic experiments
Proceedings of the 2013 International Conference on Software Engineering
Analysis and characterization of inherent application resilience for approximate computing
Proceedings of the 50th Annual Design Automation Conference
Reli: hardware/software checkpoint and recovery scheme for embedded processors
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
A survey on resiliency assessment techniques for wireless sensor networks
Proceedings of the 11th ACM international symposium on Mobility management and wireless access
Security Testing Methodology for Vulnerabilities Detection of XSS in Web Services and WS-Security
Electronic Notes in Theoretical Computer Science (ENTCS)
Component survivability at runtime for mission-critical distributed systems
The Journal of Supercomputing
Hi-index | 4.11 |
Dependability evaluation involves the study of failures and errors. The destructive nature of a crash and long error latency make it difficult to identify the causes of failures in the operational environment. It is particularly hard to recreate a failure scenario for a large, complex system. To identify and understand potential failures, the authors use an experiment-based approach for studying system dependability. This approach is applied during the conception, design, prototype, and operational phases. To take an experiment-based approach, you must first understand a system's architecture, structure, and behavior. You need to know its tolerance for faults and failures, including its built-in detection and recovery mechanisms,and you need specific instruments and tools to inject faults, create failures or errors, and monitor their effects. Engineers most often use low-cost, simulation-based fault injection to evaluate the dependability of a system that is in the conceptual and design phases. At this point, the system under study is only a series of high-level abstractions; implementation details have yet to be determined. Thus the system is simulated on the basis of simplified assumptions. Simulation-based fault injection, which assumes that errors or failures occur according to predetermined distribution, is useful for evaluating the effectiveness of fault-tolerant mechanisms and a system's dependability; it does provide timely feedback to system engineers. However, it requires accurate input parameters, which are difficult to supply: Design and technology changes often complicate the use of past measurements. Testing a prototype, on the other hand, allows you to evaluate the system without any assumptions about system design. Instead of injecting faults, engineers can directly measure operational systems as they handle real workloads.Measurement-based analysis uses actual data, which contains much information about naturally occurring errors and failures and sometimes about recovery attempts. Although these three experimental methods have limitations, their unique values complement one another and allow for a wide spectrum of dependability studies.