Consensus in the presence of partial synchrony
Journal of the ACM (JACM)
Congestion avoidance and control
SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
On scalable and efficient distributed failure detectors
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
On the Quality of Service of Failure Detectors
IEEE Transactions on Computers
A fault detection service for wide area distributed computations
Cluster Computing
Implementation and Performance Evaluation of an Adaptable Failure Detector
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Failure Detectors as First Class Objects
DOA '99 Proceedings of the International Symposium on Distributed Objects and Applications
A Fault Detection Service for Wide Area Distributed Computations
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Optimization Techniques for Replicating Corba Objects
WORDS '99 Proceedings of the Fourth International Workshop on Object-Oriented Real-Time Dependable Systems
Solving Problems in the Presence of Process Crashes and Lossy Links
Solving Problems in the Presence of Process Crashes and Lossy Links
[15] Peer-to-Peer Architecture Case Study: Gnutella Network
P2P '01 Proceedings of the First International Conference on Peer-to-Peer Computing
An Adaptive Failure Detection Protocol
PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
The " Accrual Failure Detector
SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
Definition and Specification of Accrual Failure Detectors
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
FUSE: lightweight guaranteed distributed failure notification
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A Scalable and Efficient Self-Organizing Failure Detector for Grid Applications
GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
A gossip-style failure detection service
Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
Hi-index | 0.00 |
It is widely recognised that distributed systems would greatly benefit from the availability of a generic failure detection service. In this paper, we highlighted the issue on the construction of the monitoring network of failure detectors. We proposed an algorithm to construct and manage the monitoring network that each failure detector is monitored by some failure detectors. Notification of failures is propagated along the network. Especially it can involve various types of failure detectors from simple timeout-based failure detectors to accrual failure detectors, and help to spread information on suspected processes/nodes. In addition, we have made a simulation of the proposed algorithm for constructing the monitoring network. It shows that the algorithm is scalable for increasing the number of failure detectors.