Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
Predicting How Badly "Good" Software Can Behave
IEEE Software
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems
IEEE Transactions on Computers
Implementation and Performance Evaluation of an Adaptable Failure Detector
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Simgrid: A Toolkit for the Simulation of Application Scheduling
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Centralized Failure Injection for Distributed,Fault-Tolerant Protocol Testing
ICDCS '97 Proceedings of the 17th International Conference on Distributed Computing Systems (ICDCS '97)
NIST Net: a Linux-based network emulation tool
ACM SIGCOMM Computer Communication Review
Concurrency and Computation: Practice & Experience - Adaptive Grid Middleware
Dummynet and forward error correction
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Hi-index | 0.01 |
Computing grids are large-scale, highly-distributed, often hierarchical, platforms. At such scales, failures are no longer exceptions, but part of the normal behavior. When designing software for grids, developers have to take failures into account. It is crucial to make experiments at a large scale, with various volatility conditions, in order to measure the impact of failures on the whole system. This paper presents an experimental tool allowing the user to inject failures during a practical evaluation of fault-tolerant systems.We illustrate the usefulness of our tool through an evaluation of a hierarchical grid failure detector.