Epidemic algorithms for replicated database maintenance
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
ACM Transactions on Computer Systems (TOCS)
On scalable and efficient distributed failure detectors
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
On the Quality of Service of Failure Detectors
IEEE Transactions on Computers
A fault detection service for wide area distributed computations
Cluster Computing
Peer-to-Peer Membership Management for Gossip-Based Protocols
IEEE Transactions on Computers
Improving the Scalability of Multi-Agent Systems
Revised Papers from the International Workshop on Infrastructure for Multi-Agent Systems: Infrastructure for Agents, Multi-Agent Systems, and Scalable Multi-Agent Systems
Lightweight Probabilistic Broadcast
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Implementation and Performance Evaluation of an Adaptable Failure Detector
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Fault Tolerance in Scalable Agent Support Systems: Integrating DARX in the AgentScape Framework
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Failure Detectors for Large-Scale Distributed Systems
SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
An Experimental Evaluation of Domain-Independent Fault Handling Services in Open Multi-Agent Systems
ICMAS '00 Proceedings of the Fourth International Conference on MultiAgent Systems (ICMAS-2000)
DARX—A Framework For The Fault-Tolerant Support Of Agent Software
ISSRE '03 Proceedings of the 14th International Symposium on Software Reliability Engineering
The " Accrual Failure Detector
SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
The peer sampling service: experimental evaluation of unstructured gossip-based implementations
Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Scalable fault tolerant Agent Grooming Environment: SAGE
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
A Scalable and Efficient Self-Organizing Failure Detector for Grid Applications
GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
A gossip-style failure detection service
Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
Hi-index | 0.00 |
This paper addresses the problem of building a failure detection service for large scale distributed systems, as well as multi-agent systems. It describes the failure detector mechanism and defines the roles it plays in the system. Afterwards, the key construction problems that are fundamental in the context of building the failure detection service are presented. Finally, a sketch of general framework for implementing such a service is described. The proposed failure detection service can be used by mobile agents as a crucial component for building fault-tolerant multi-agent systems.