The " Accrual Failure Detector

Authors:
Naohiro Hayashibara;Xavier Defago;Rami Yared;Takuya Katayama
Affiliations:
Japan Advanced Institute of Science and Technology (JAIST);Japan Advanced Institute of Science and Technology (JAIST)/ PRESTO, Japan Science and Technology Agency (JST);Japan Advanced Institute of Science and Technology (JAIST);Japan Advanced Institute of Science and Technology (JAIST)
Venue:
SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
Year:
2004

Citing 0
Cited 18

Evaluation of the QoS of crash-recovery failure detection

Proceedings of the 2007 ACM symposium on Applied computing
A new adaptive accrual failure detector for dependable distributed systems

Proceedings of the 2007 ACM symposium on Applied computing
A group membership service for large-scale grids

Proceedings of the 6th international workshop on Middleware for grid computing
Failure Detection Service for Large Scale Systems

KES-AMSTA '07 Proceedings of the 1st KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications
Design of the notification system for failure detectors

International Journal of High Performance Computing and Networking
Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems

IEEE Journal on Selected Areas in Communications - Special issue on wireless and pervasive communications for healthcare
Variations and evaluations of an adaptive accrual failure detector to enable self-healing properties in distributed systems

ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Fuzzy-grey prediction based dynamic failure detector for distributed systems

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Sensor-actuator communication protocols in wireless networks

NBiS'07 Proceedings of the 1st international conference on Network-based information systems
Skip ring topology in fast failure detection service

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Autonomous and scalable failure detection in distributed systems

International Journal of Autonomous and Adaptive Communications Systems
NN-SA based dynamic failure detector for services composition in distributed environment

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Optimistic access control for distributed collaborative editors

Proceedings of the 2011 ACM Symposium on Applied Computing
Detecting failures in distributed systems with the Falcon spy network

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
A case for event-driven distributed objects

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
The lost art of abstraction

Architecting Dependable Systems III
On affirmative adaptive failure detection

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Improving availability in distributed systems with failure informers

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The detection of failures is a fundamental issue for fault-tolerance in distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup or time synchronization. However, this has not been successful so far; one of the reasons being the fact that classical failure detectors were not designed to satisfy several application requirements simultaneously. We present a novel abstraction, called accrual failure detectors, that emphasizes flexibility and expressiveness and can serve as a basic building block to implementing failure detectors in distributed systems. Instead of providing information of a binary nature (trust vs. suspect), accrual failure detectors output a suspicion level on a continuous scale. The principal merit of this approach is that it favors a nearly complete decoupling between application requirements and the monitoring of the environment. In this paper, we describe an implementation of such an accrual failure detector, that we call the 驴 failure detector. The particularity of the 驴 failure detector is that it dynamically adjusts to current network conditions the scale on which the suspicion level is expressed. We analyzed the behavior of our 驴 failure detector over an intercontinental communication link over a week. Our experimental results show that 驴 performs equally well as other known adaptive failure detection mechanisms, with an improved flexibility.