Adapting failure detectors to communication network load fluctuations using SNMP and artificial neural nets

Authors:
Fábio Lima;Raimundo Macêdo
Affiliations:
Distributed Systems Laboratory – LaSiD, Computing Science Department, Federal University of Bahia, Salvador, BA, Brazil;Distributed Systems Laboratory – LaSiD, Computing Science Department, Federal University of Bahia, Salvador, BA, Brazil
Venue:
LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Year:
2005

Citing 13
Cited 2

Consensus in the presence of partial synchrony

Journal of the ACM (JACM)
Congestion avoidance and control

SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The weakest failure detector for solving consensus

Journal of the ACM (JACM)
Optimal implementation of the weakest failure detector for solving consensus (brief announcement)

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
On the Quality of Service of Failure Detectors

IEEE Transactions on Computers
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
he Timely Computing Base: Timely Actions in the Presence of Uncertain Timeliness

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Implementation and Performance Evaluation of an Adaptable Failure Detector

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A General Framework to Solve Agreement Problems

SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
ADAPTATION - Algorithms to ADAPTive FAulT MonItOriNg and Their Implementation on CORBA

DOA '01 Proceedings of the Third International Symposium on Distributed Objects and Applications
A Hybrid and Adaptive Model for Fault-Tolerant Distributed Computing

DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks

Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems

IEEE Journal on Selected Areas in Communications - Special issue on wireless and pervasive communications for healthcare
QoS self-configuring failure detectors for distributed systems

DAIS'10 Proceedings of the 10th IFIP WG 6.1 international conference on Distributed Applications and Interoperable Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A failure detector is an important building block for fault-tolerant distributed computing: mechanisms such as distributed consensus and group communication rely on the information provided by failure detectors in order to make progress and terminate. As such, erroneous information provided by the failure detector (or the absence of it) may delay decision-making or lead the upper-layer fault-tolerant mechanism to take incorrect decisions (e.g., the exclution of a correct process from a group membership). On the other hand, the implementation of failure detectors that can precisely identify failures is restricted by the actual behaviour of a system, especially in settings where message transmission delays and system loads can vary over time. In this paper we explore the use of artificial neural networks in order to implement failure detectors that are dynamically adapted to the current communication load conditions. The training patterns used to feed the neural network were obtained by using Simple Network Management Protocol (SNMP) agents over MIB – Management Information Base variables. The output of such neural network is an estimation for the arrival time for the failure detector to receive the next heartbeat message from a remote process. The suggested approach was fully implemented and tested over a set of GNU/Linux networked workstations. In order to analyze the efficiency of our approach, we have run a series of experiments where network loads were varied randomly, and we measured several QoS parameters, comparing our detector against known implementations. The performance data collected indicate that neural networks and MIB variables can indeed be combined to improve the QoS of failure detectors.