Simulation-based Testing of Communication Protocols for Dependable Embedded Systems
The Journal of Supercomputing - Special issue on embedded fault-tolerance systems
Cesium: Testing Hard Real-time and Dependability Properties of Distributed Protocols
WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
A Global-State-Triggered Fault Injector for Distributed System Evaluation
IEEE Transactions on Parallel and Distributed Systems
Using failure injection mechanisms to experiment and evaluate a grid failure detector
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Hi-index | 0.00 |
We describe a centralized approach to testing that distributed fault-tolerant protocols satisfy their safety and timeliness specifications in the presence of the very failures they are designed to tolerate. Cesium is a testing environment based on the centralized simulation of distributed executions and failures. Processes are run in a single address space while providing the appearance of a truly distributed execution. The human tester can force the occurrence of arbitrary failures and security attacks. The implementations under test are not instrumented for testing purposes, and their source codes need not be available. We prove that Cesium can execute exactly the set of runs feasible in the real distributed system being simulated. We also show that there are safety and timeliness properties in the specifications of many existing distributed protocols that cannot be tested in practical distributed systems. All of these properties can, however, be accurately tested by Cesium without introducing any perturbation in test experiments.