Fault Injection for Dependability Validation: A Methodology and Some Applications
IEEE Transactions on Software Engineering
A new look at membership services (extended abstract)
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Specifying and using a partitionable group communication service
PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Building adaptive systems using ensemble
Software—Practice & Experience - Special issue on multiprocessor operating systems
An evaluation of flow control in group communication
IEEE/ACM Transactions on Networking (TON)
Coverage Estimation Methods for Stratified Fault-Injection
IEEE Transactions on Computers
Building Secure and Reliable Network Applications
Building Secure and Reliable Network Applications
JEWEL: Design and Implementation of a Distributed Measurement System
IEEE Transactions on Parallel and Distributed Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
SPI: an instrumentation development environment for parallel/distributed systems
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Loki: A State-Driven Fault Injector for Distributed Systems
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Experimental Evaluation of PVM Group Communication
Proceedings of the 4th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Proof Environment for the Development of Group Communication Systems
CADE-15 Proceedings of the 15th International Conference on Automated Deduction: Automated Deduction
Testing of fault-tolerant and real-time distributed systems via protocol fault injection
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Dependability Analysis Of A Commercial High-Speed Network
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
Fault Injection Based on a Partial View of the Global State of a Distributed System
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Dynamic Node Management and Measure Estimation in a State-Driven Fault Injector
SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
A Framework for Assessing Dependability in Distributed Systems with Lightweight Fault Injectors
IPDS '00 Proceedings of the 4th International Computer Performance and Dependability Symposium
DOCTOR: an integrated software fault injection environment for distributed real-time systems
IPDS '95 Proceedings of the International Computer Performance and Dependability Symposium on Computer Performance and Dependability Symposium
The ensemble system
An approach to experimentally obtain service dependability characteristics of the Jgroup/ARM system
EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Hi-index | 0.00 |
Group communication is an important paradigm for building highly available distributed systems. However, group membership operations often require the system to block message traffic, causing system services to become unavailable. This makes it important to quantify the unavailability induced by membership operations. This paper experimentally evaluates the blocking behavior of the group membership protocol of the Ensemble group communication system using a novel global-state-based fault injection technique. In doing so, we demonstrate how a layered distributed protocol such as the Ensemble group membership protocol can be modeled in terms of a state machine abstraction, and show how the resulting global state space can be used to specify fault triggers and define important measures on the system. Using this approach, we evaluate the cost associated with important states of the protocol under varying workload and group size. We also evaluate the sensitivity of the protocol to the occurrence of a second correlated crash failure during its operation.