On implementing omega with weak reliability and synchrony assumptions
Proceedings of the twenty-second annual symposium on Principles of distributed computing
On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems
IEEE Transactions on Computers
Monitoring and Organizational-Level Adaptation of Multi-Agent Systems
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Failure, connectivity and disconnection detectors
UbiMob '04 Proceedings of the 1st French-speaking conference on Mobility and ubiquity computing
SELMAS '05 Proceedings of the fourth international workshop on Software engineering for large-scale multi-agent systems
Détection de partition pour la gestion de groupes en environnement mobile
UbiMob '05 Proceedings of the 2nd French-speaking conference on Mobility and ubiquity computing
ALTER: first step towards dependable grids
Proceedings of the 2006 ACM symposium on Applied computing
The notification based approach to implementing failure detectors in distributed systems
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
A new adaptive accrual failure detector for dependable distributed systems
Proceedings of the 2007 ACM symposium on Applied computing
Latency and bandwidth-minimizing failure detectors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
APNOMS '08 Proceedings of the 11th Asia-Pacific Symposium on Network Operations and Management: Challenges for Next Generation Network Operations and Service Management
Failure Detection Service for Large Scale Systems
KES-AMSTA '07 Proceedings of the 1st KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications
Semantic partitioning of peer-to-peer search space
Computer Communications
Design of the notification system for failure detectors
International Journal of High Performance Computing and Networking
IEEE Journal on Selected Areas in Communications - Special issue on wireless and pervasive communications for healthcare
Using failure injection mechanisms to experiment and evaluate a grid failure detector
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
DARX: a self-healing framework for agents
Proceedings of the 12th Monterey conference on Reliable systems on unreliable networked platforms
Fuzzy-grey prediction based dynamic failure detector for distributed systems
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Skip ring topology in fast failure detection service
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Crash-quiescent failure detection
DISC'09 Proceedings of the 23rd international conference on Distributed computing
NN-SA based dynamic failure detector for services composition in distributed environment
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Detecting failures in distributed systems with the Falcon spy network
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Towards fault-tolerant massively multiagent systems
MMAS'04 Proceedings of the First international conference on Massively Multi-Agent Systems
An architectural framework for detecting process hangs/crashes
EDCC'05 Proceedings of the 5th European conference on Dependable Computing
LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Software Engineering for Multi-Agent Systems IV
Timeout-based adaptive consensus: improving performance through adaptation
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Eventually perfect failure detectors using ADD channels
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
High performance checksum computation for fault-tolerant MPI over infiniband
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
On affirmative adaptive failure detection
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
A Failure Detection System for Large Scale Distributed Systems
International Journal of Distributed Systems and Technologies
Improving availability in distributed systems with failure informers
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Chandra and Toueg introduced the concept of unreliable failure detectors. They showed how, by adding these detectors to an asynchronous system, it is possible to solve the Consensus problem. In this paper, we propose a new implementation of a failure detector. This implementation is a variant of the heartbeat failure detector which is adaptable and can support scalable applications. In this implementation we dissociate two aspects: a basic estimation of the expected arrival date to provide a short detection time, and an adaptation of the quality of service according to application needs. The latter is based on two principles: an adaptation layer and a heuristic to adapt the sending period of "I am alive" messages.