Autonomous and scalable failure detection in distributed systems

Authors:
Benjamin Satzger;Andreas Pietzowski;Theo Ungerer
Affiliations:
Department of Computer Science, University of Augsburg, D-86135 Augsburg, Germany.;Department of Computer Science, University of Augsburg, D-86135 Augsburg, Germany.;Department of Computer Science, University of Augsburg, D-86135 Augsburg, Germany
Venue:
International Journal of Autonomous and Adaptive Communications Systems
Year:
2011

Citing 16
Cited 0

Epidemic algorithms for replicated database maintenance

PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
A hundred impossibility proofs for distributed computing

Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The weakest failure detector for solving consensus

Journal of the ACM (JACM)
On scalable and efficient distributed failure detectors

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
On the Quality of Service of Failure Detectors

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Gossip-Style Failure Detection Service

A Gossip-Style Failure Detection Service
An Adaptive Failure Detection Protocol

PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
The " Accrual Failure Detector

SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
Research challenges of autonomic computing

Proceedings of the 27th international conference on Software engineering
A new adaptive accrual failure detector for dependable distributed systems

Proceedings of the 2007 ACM symposium on Applied computing
A Scalable and Efficient Self-Organizing Failure Detector for Grid Applications

GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Grouping algorithms for scalable self-monitoring distributed systems

Autonomics '08 Proceedings of the 2nd International Conference on Autonomic Computing and Communication Systems
Variations and evaluations of an adaptive accrual failure detector to enable self-healing properties in distributed systems

ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The growing complexity of distributed systems makes it more and more difficult to manage them. Therefore, it is necessary that such systems will be able to adapt autonomously to their environment. They should be characterised by so-called self-x properties such as self-configuration or self-healing. The autonomous detection of failures in distributed environments is a crucial part for developing self-healing systems. In this paper, we introduce algorithms to form monitoring relations and propose to utilise these for a scalable autonomous failure detection. The evaluation of the developed algorithms indicates that they are suitable for complex, large scale and distributed systems.