Design of the notification system for failure detectors

Authors:
Naohiro Hayashibara;Makoto Takizawa
Affiliations:
Faculty of Computer Science and Engineering, Department of Computer Science, Kyoto Sangyo University, Japan.;Faculty of Science and Technology, Department of Computers and Information Science, Seikei University, Japan
Venue:
International Journal of High Performance Computing and Networking
Year:
2009

Citing 19
Cited 0

Consensus in the presence of partial synchrony

Journal of the ACM (JACM)
Congestion avoidance and control

SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
On scalable and efficient distributed failure detectors

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
On the Quality of Service of Failure Detectors

IEEE Transactions on Computers
A fault detection service for wide area distributed computations

Cluster Computing
Implementation and Performance Evaluation of an Adaptable Failure Detector

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Failure Detectors as First Class Objects

DOA '99 Proceedings of the International Symposium on Distributed Objects and Applications
A Fault Detection Service for Wide Area Distributed Computations

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Optimization Techniques for Replicating Corba Objects

WORDS '99 Proceedings of the Fourth International Workshop on Object-Oriented Real-Time Dependable Systems
Solving Problems in the Presence of Process Crashes and Lossy Links

Solving Problems in the Presence of Process Crashes and Lossy Links
[15] Peer-to-Peer Architecture Case Study: Gnutella Network

P2P '01 Proceedings of the First International Conference on Peer-to-Peer Computing
An Adaptive Failure Detection Protocol

PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
The " Accrual Failure Detector

SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
Definition and Specification of Accrual Failure Detectors

DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
FUSE: lightweight guaranteed distributed failure notification

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A Scalable and Efficient Self-Organizing Failure Detector for Grid Applications

GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
A gossip-style failure detection service

Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is widely recognised that distributed systems would greatly benefit from the availability of a generic failure detection service. In this paper, we highlighted the issue on the construction of the monitoring network of failure detectors. We proposed an algorithm to construct and manage the monitoring network that each failure detector is monitored by some failure detectors. Notification of failures is propagated along the network. Especially it can involve various types of failure detectors from simple timeout-based failure detectors to accrual failure detectors, and help to spread information on suspected processes/nodes. In addition, we have made a simulation of the proposed algorithm for constructing the monitoring network. It shows that the algorithm is scalable for increasing the number of failure detectors.