Implementation and Performance Evaluation of an Adaptable Failure Detector

Authors:
Marin Bertier;Olivier Marin;Pierre Sens
Affiliations:
-;-;-
Venue:
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Year:
2002

Citing 0
Cited 34

On implementing omega with weak reliability and synchrony assumptions

Proceedings of the twenty-second annual symposium on Principles of distributed computing
On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems

IEEE Transactions on Computers
Monitoring and Organizational-Level Adaptation of Multi-Agent Systems

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Failure, connectivity and disconnection detectors

UbiMob '04 Proceedings of the 1st French-speaking conference on Mobility and ubiquity computing
A short introduction to failure detectors for asynchronous distributed systems

ACM SIGACT News
Adaptive replication of large-scale multi-agent systems: towards a fault-tolerant multi-agent platform

SELMAS '05 Proceedings of the fourth international workshop on Software engineering for large-scale multi-agent systems
Détection de partition pour la gestion de groupes en environnement mobile

UbiMob '05 Proceedings of the 2nd French-speaking conference on Mobility and ubiquity computing
ALTER: first step towards dependable grids

Proceedings of the 2006 ACM symposium on Applied computing
The notification based approach to implementing failure detectors in distributed systems

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
A new adaptive accrual failure detector for dependable distributed systems

Proceedings of the 2007 ACM symposium on Applied computing
Latency and bandwidth-minimizing failure detectors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Performance Evaluation of Heartbeat-Style Failure Detector over Proactive and Reactive Routing Protocols for Mobile Ad Hoc Network

APNOMS '08 Proceedings of the 11th Asia-Pacific Symposium on Network Operations and Management: Challenges for Next Generation Network Operations and Service Management
Failure Detection Service for Large Scale Systems

KES-AMSTA '07 Proceedings of the 1st KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications
Semantic partitioning of peer-to-peer search space

Computer Communications
Design of the notification system for failure detectors

International Journal of High Performance Computing and Networking
Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems

IEEE Journal on Selected Areas in Communications - Special issue on wireless and pervasive communications for healthcare
Using failure injection mechanisms to experiment and evaluate a grid failure detector

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Variations and evaluations of an adaptive accrual failure detector to enable self-healing properties in distributed systems

ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
DARX: a self-healing framework for agents

Proceedings of the 12th Monterey conference on Reliable systems on unreliable networked platforms
Fuzzy-grey prediction based dynamic failure detector for distributed systems

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Skip ring topology in fast failure detection service

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Crash-quiescent failure detection

DISC'09 Proceedings of the 23rd international conference on Distributed computing
NN-SA based dynamic failure detector for services composition in distributed environment

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Detecting failures in distributed systems with the Falcon spy network

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Towards fault-tolerant massively multiagent systems

MMAS'04 Proceedings of the First international conference on Massively Multi-Agent Systems
An architectural framework for detecting process hangs/crashes

EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Adapting failure detectors to communication network load fluctuations using SNMP and artificial neural nets

LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Adaptive replication of large-scale multi-agent systems – towards a fault-tolerant multi-agent platform

Software Engineering for Multi-Agent Systems IV
Timeout-based adaptive consensus: improving performance through adaptation

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Eventually perfect failure detectors using ADD channels

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
High performance checksum computation for fault-tolerant MPI over infiniband

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
On affirmative adaptive failure detection

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
A Failure Detection System for Large Scale Distributed Systems

International Journal of Distributed Systems and Technologies
Improving availability in distributed systems with failure informers

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chandra and Toueg introduced the concept of unreliable failure detectors. They showed how, by adding these detectors to an asynchronous system, it is possible to solve the Consensus problem. In this paper, we propose a new implementation of a failure detector. This implementation is a variant of the heartbeat failure detector which is adaptable and can support scalable applications. In this implementation we dissociate two aspects: a basic estimation of the expected arrival date to provide a short detection time, and an adaptation of the quality of service according to application needs. The latter is based on two principles: an adaptation layer and a heuristic to adapt the sending period of "I am alive" messages.