Synchronizing clocks in the presence of faults
Journal of the ACM (JACM)
Clock synchronization in distributed real-time systems
IEEE Transactions on Computers - Special Issue on Real-Time Systems
Algorithms
On efficiently implementing global time for performance evaluation on multiprocessor systems
Journal of Parallel and Distributed Computing
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
SPI: an instrumentation development environment for parallel/distributed systems
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
DOCTOR: an integrated software fault injection environment for distributed real-time systems
IPDS '95 Proceedings of the International Computer Performance and Dependability Symposium on Computer Performance and Dependability Symposium
Experimental Evaluation of the Unavailability Induced by a Group Membership Protocol
EDCC-4 Proceedings of the 4th European Dependable Computing Conference on Dependable Computing
Testing of java web services for robustness
ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Robustness Testing of Java Server Applications
IEEE Transactions on Software Engineering
Case-based software reliability assessmentby fault injection unified procedures
Proceedings of the 2008 international workshop on Software Engineering in east and south europe
SNOOZE: toward a stateful network protocol fuzZEr
ISC'06 Proceedings of the 9th international conference on Information Security
Hi-index | 0.00 |
Validating distributed systems is particularly difficult, since failures may occur due to a correlated occurrence of faults in different parts of the system. This paper describes the basis for and preliminary implementation of a new fault injector, called Loki, developed specifically for distributed systems. Loki addresses issues related to injecting correlated faults in distributed systems. In Loki, fault injection is performed based on a partial view of the global state of an application. In particular, facilities are provided to pass user-specified state information between nodes to provide a partial view of the global state in order to try to inject complex faults successfully. A post-runtime analysis, done using an off-line clock synchronization and a bounding technique, is used to place events and injections on a single global timeline and determine whether the intended faults were properly injected. Finally, observations containing successful fault injections are used to estimate specified dependability measures. In addition to describing the details of our new approach, we present experimental results obtained from a preliminary implementation in order to illustrate Loki's ability to inject complex faults predictably.