Centralized Failure Injection for Distributed,Fault-Tolerant Protocol Testing

  • Authors:
  • Guillermo A. Alvarez;Flaviu Cristian

  • Affiliations:
  • -;-

  • Venue:
  • ICDCS '97 Proceedings of the 17th International Conference on Distributed Computing Systems (ICDCS '97)
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a centralized approach to testing that distributed fault-tolerant protocols satisfy their safety and timeliness specifications in the presence of the very failures they are designed to tolerate. Cesium is a testing environment based on the centralized simulation of distributed executions and failures. Processes are run in a single address space while providing the appearance of a truly distributed execution. The human tester can force the occurrence of arbitrary failures and security attacks. The implementations under test are not instrumented for testing purposes, and their source codes need not be available. We prove that Cesium can execute exactly the set of runs feasible in the real distributed system being simulated. We also show that there are safety and timeliness properties in the specifications of many existing distributed protocols that cannot be tested in practical distributed systems. All of these properties can, however, be accurately tested by Cesium without introducing any perturbation in test experiments.