An example of stepwise refinement of distributed programs: quiescence detection
ACM Transactions on Programming Languages and Systems (TOPLAS) - The MIT Press scientific computation series
Measurement and Application of Fault Latency
IEEE Transactions on Computers - The MIT Press scientific computation series
Fault Injection for Dependability Validation: A Methodology and Some Applications
IEEE Transactions on Software Engineering
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
JEWEL: Design and Implementation of a Distributed Measurement System
IEEE Transactions on Parallel and Distributed Systems
SPI: an instrumentation development environment for parallel/distributed systems
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Detection of Global State Predicates
WDAG '91 Proceedings of the 5th International Workshop on Distributed Algorithms
Measuring Fault Tolerance with the FTAPE Fault Injection Tool
MMB '95 Proceedings of the 8th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation: Quantitative Evaluation of Computing and Communication Systems
Testing of fault-tolerant and real-time distributed systems via protocol fault injection
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Dynamic Node Management and Measure Estimation in a State-Driven Fault Injector
SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
Centralized Failure Injection for Distributed,Fault-Tolerant Protocol Testing
ICDCS '97 Proceedings of the 17th International Conference on Distributed Computing Systems (ICDCS '97)
A Framework for Assessing Dependability in Distributed Systems with Lightweight Fault Injectors
IPDS '00 Proceedings of the 4th International Computer Performance and Dependability Symposium
DOCTOR: an integrated software fault injection environment for distributed real-time systems
IPDS '95 Proceedings of the International Computer Performance and Dependability Symposium on Computer Performance and Dependability Symposium
Jgroup-ARM: a distributed object group platform with autonomous replication management
Software—Practice & Experience
Journal of Systems and Software
An approach to experimentally obtain service dependability characteristics of the Jgroup/ARM system
EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Designing fault injection experiments using state-based model to test a space software
LADC'07 Proceedings of the Third Latin-American conference on Dependable Computing
Hi-index | 0.00 |
Validation of the dependability of distributed systems via fault injection is gaining importance because distributed systems are being increasingly used in environments with high dependability requirements. The fact that distributed systems can fail in subtle ways that depend on the state of multiple parts of the system suggests that a global-state-based fault injection mechanism should be used to validate them. However, global-state-based fault injection is challenging since it is very difficult in practice to maintain the global state of a distributed system at runtime with minimal intrusion into the system execution. This paper presents Loki, a global-state-based fault injector, which has been designed with the goals of low intrusion, high precision, and high flexibility. Loki achieves these goals by utilizing the ideas of partial view of global state, optimistic synchronization, and offline analysis. In Loki, faults are injected based on a partial view of the global state of the system, and a postruntime analysis is performed to place events and injections into a single global timeline and to discard experiments with incorrect fault injections. Finally, the experiments with correct fault injections are used to estimate user-specified performance and dependability measures. A flexible measure language has been designed that facilitates the specification of a wide range of measures.