Epidemic algorithms for replicated database maintenance
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
A hundred impossibility proofs for distributed computing
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
The weakest failure detector for solving consensus
Journal of the ACM (JACM)
On scalable and efficient distributed failure detectors
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
On the Quality of Service of Failure Detectors
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Gossip-Style Failure Detection Service
A Gossip-Style Failure Detection Service
An Adaptive Failure Detection Protocol
PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
The " Accrual Failure Detector
SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
Research challenges of autonomic computing
Proceedings of the 27th international conference on Software engineering
A new adaptive accrual failure detector for dependable distributed systems
Proceedings of the 2007 ACM symposium on Applied computing
A Scalable and Efficient Self-Organizing Failure Detector for Grid Applications
GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Grouping algorithms for scalable self-monitoring distributed systems
Autonomics '08 Proceedings of the 2nd International Conference on Autonomic Computing and Communication Systems
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Hi-index | 0.00 |
The growing complexity of distributed systems makes it more and more difficult to manage them. Therefore, it is necessary that such systems will be able to adapt autonomously to their environment. They should be characterised by so-called self-x properties such as self-configuration or self-healing. The autonomous detection of failures in distributed environments is a crucial part for developing self-healing systems. In this paper, we introduce algorithms to form monitoring relations and propose to utilise these for a scalable autonomous failure detection. The evaluation of the developed algorithms indicates that they are suitable for complex, large scale and distributed systems.