Evaluation of the QoS of crash-recovery failure detection
Proceedings of the 2007 ACM symposium on Applied computing
A new adaptive accrual failure detector for dependable distributed systems
Proceedings of the 2007 ACM symposium on Applied computing
A group membership service for large-scale grids
Proceedings of the 6th international workshop on Middleware for grid computing
Failure Detection Service for Large Scale Systems
KES-AMSTA '07 Proceedings of the 1st KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications
Design of the notification system for failure detectors
International Journal of High Performance Computing and Networking
IEEE Journal on Selected Areas in Communications - Special issue on wireless and pervasive communications for healthcare
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Fuzzy-grey prediction based dynamic failure detector for distributed systems
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Sensor-actuator communication protocols in wireless networks
NBiS'07 Proceedings of the 1st international conference on Network-based information systems
Skip ring topology in fast failure detection service
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Autonomous and scalable failure detection in distributed systems
International Journal of Autonomous and Adaptive Communications Systems
NN-SA based dynamic failure detector for services composition in distributed environment
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Optimistic access control for distributed collaborative editors
Proceedings of the 2011 ACM Symposium on Applied Computing
Detecting failures in distributed systems with the Falcon spy network
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
A case for event-driven distributed objects
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
Architecting Dependable Systems III
On affirmative adaptive failure detection
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Improving availability in distributed systems with failure informers
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
The detection of failures is a fundamental issue for fault-tolerance in distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup or time synchronization. However, this has not been successful so far; one of the reasons being the fact that classical failure detectors were not designed to satisfy several application requirements simultaneously. We present a novel abstraction, called accrual failure detectors, that emphasizes flexibility and expressiveness and can serve as a basic building block to implementing failure detectors in distributed systems. Instead of providing information of a binary nature (trust vs. suspect), accrual failure detectors output a suspicion level on a continuous scale. The principal merit of this approach is that it favors a nearly complete decoupling between application requirements and the monitoring of the environment. In this paper, we describe an implementation of such an accrual failure detector, that we call the 驴 failure detector. The particularity of the 驴 failure detector is that it dynamically adjusts to current network conditions the scale on which the suspicion level is expressed. We analyzed the behavior of our 驴 failure detector over an intercontinental communication link over a week. Our experimental results show that 驴 performs equally well as other known adaptive failure detection mechanisms, with an improved flexibility.