Measurement-Based Analysis of Error Latency
IEEE Transactions on Computers
Performance Modeling Based on Real Data: A Case Study
IEEE Transactions on Computers - Fault-Tolerant Computing
Influence of Workload on Error Recovery in Random Access Memories
IEEE Transactions on Computers - Fault-Tolerant Computing
Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems
IEEE Transactions on Computers
Monte Carlo simulation of complex system mission reliability
WSC '89 Proceedings of the 21st conference on Winter simulation
Fault Injection for Dependability Validation: A Methodology and Some Applications
IEEE Transactions on Software Engineering
NEST: a network simulation and prototyping testbed
Communications of the ACM - Special issue on simulation
The UltraSAN modeling environment
Performance Evaluation - Special issue: performance modeling tools
CSIM: a C-based process-oriented simulation language
WSC '86 Proceedings of the 18th conference on Winter simulation
Decomposability, instabilities, and saturation in multiprogramming systems
Communications of the ACM
Object-Oriented Software Construction
Object-Oriented Software Construction
Prediction-Based Dynamic Load-Sharing Heuristics
IEEE Transactions on Parallel and Distributed Systems
SPNP: Stochastic Petri Net Package
PNPM '89 The Proceedings of the Third International Workshop on Petri Nets and Performance Models
Use of Hybrid and Hierarchical Simulation to Reduce Computation Costs
MASCOTS '93 Proceedings of the International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
An approach towards benchmarking of fault-tolerant commercial systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Computer system simulation with ASPOL
ANSS '73 Proceedings of the 1st symposium on Simulation of computer systems
DOCTOR: an integrated software fault injection environment for distributed real-time systems
IPDS '95 Proceedings of the International Computer Performance and Dependability Symposium on Computer Performance and Dependability Symposium
IEEE Transactions on Computers
Fault-Containment in Cache Memories for TMR Redundant Processor Systems
IEEE Transactions on Computers
Hierarchical Simulation Approach to Accurate Fault Modeling for System Dependability Evaluation
IEEE Transactions on Software Engineering
Evaluating system dependability in a co-design framework
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Teraflops Supercomputer: Architecture and Validation of the Fault Tolerance Mechanisms
IEEE Transactions on Computers
PROPANE: an environment for examining the propagation of errors in software
ISSTA '02 Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis
Design and Analysis of an Integrated Checkpointing and Recovery Scheme for Distributed Applications
IEEE Transactions on Knowledge and Data Engineering
Hierarchical Error Detection in a Software Implemented Fault Tolerance (SIFT) Environment
IEEE Transactions on Knowledge and Data Engineering
Assessing Error Detection Coverage by Simulated Fault Injection
EDCC-3 Proceedings of the Third European Dependable Computing Conference on Dependable Computing
An Investigation on Mutation Strategies for Fault Injection into RDD-100 Models
SAFECOMP '01 Proceedings of the 20th International Conference on Computer Safety, Reliability and Security
SECURE: A Simulation Tool for PKI Design
Proceedings of the International Exhibition and Congress on Secure Networking - CQRE (Secure) '99
Reflections on Industry Trends and Experimental Research in Dependability
IEEE Transactions on Dependable and Secure Computing
Logic soft errors in sub-65nm technologies design and CAD challenges
Proceedings of the 42nd annual Design Automation Conference
Verification-guided soft error resilience
Proceedings of the conference on Design, automation and test in Europe
Study of the Effects of SEU-Induced Faults on a Pipeline Protected Microprocessor
IEEE Transactions on Computers
Case-based software reliability assessmentby fault injection unified procedures
Proceedings of the 2008 international workshop on Software Engineering in east and south europe
Cross-layer resilience challenges: metrics and optimization
Proceedings of the Conference on Design, Automation and Test in Europe
MODIFI: a MODel-implemented fault injection tool
SAFECOMP'10 Proceedings of the 29th international conference on Computer safety, reliability, and security
System of systems hazard analysis using simulation and machine learning
SAFECOMP'06 Proceedings of the 25th international conference on Computer Safety, Reliability, and Security
Physical-defect modeling and optimization for fault-insertion test
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Combining Fault-Injection with Property-Based Testing
Proceedings of International Workshop on Engineering Simulations for Cyber-Physical Systems
Hi-index | 14.99 |
The paper presents the rationale for a functional simulation tool, called DEPEND, which provides an integrated design and fault injection environment for system level dependability analysis. The paper discusses the issues and problems of developing such a tool, and describes how DEPEND tackles them. Techniques developed to simulate realistic fault scenarios, reduce simulation time explosion, and handle the large fault model and component domain associated with system level analysis are presented. Examples are used to motivate and illustrate the benefits of this tool. To further illustrate its capabilities, DEPEND is used to simulate the Unix-based Tandem triple-modular-redundancy (TMR) based prototype fault-tolerant system and evaluate how well it handles near-coincident errors caused by correlated and latent faults. Issues such as memory scrubbing, re-integration policies, and workload dependent repair times, which affect how the system handles near-coincident errors, are also evaluated. Unlike any other simulation-based dependability studies, the accuracy of the simulation model is validated by comparing the results of the simulations with measurements obtained from fault injection experiments conducted on a production Tandem machine.