Adapting failure detectors to communication network load fluctuations using SNMP and artificial neural nets

  • Authors:
  • Fábio Lima;Raimundo Macêdo

  • Affiliations:
  • Distributed Systems Laboratory – LaSiD, Computing Science Department, Federal University of Bahia, Salvador, BA, Brazil;Distributed Systems Laboratory – LaSiD, Computing Science Department, Federal University of Bahia, Salvador, BA, Brazil

  • Venue:
  • LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A failure detector is an important building block for fault-tolerant distributed computing: mechanisms such as distributed consensus and group communication rely on the information provided by failure detectors in order to make progress and terminate. As such, erroneous information provided by the failure detector (or the absence of it) may delay decision-making or lead the upper-layer fault-tolerant mechanism to take incorrect decisions (e.g., the exclution of a correct process from a group membership). On the other hand, the implementation of failure detectors that can precisely identify failures is restricted by the actual behaviour of a system, especially in settings where message transmission delays and system loads can vary over time. In this paper we explore the use of artificial neural networks in order to implement failure detectors that are dynamically adapted to the current communication load conditions. The training patterns used to feed the neural network were obtained by using Simple Network Management Protocol (SNMP) agents over MIB – Management Information Base variables. The output of such neural network is an estimation for the arrival time for the failure detector to receive the next heartbeat message from a remote process. The suggested approach was fully implemented and tested over a set of GNU/Linux networked workstations. In order to analyze the efficiency of our approach, we have run a series of experiments where network loads were varied randomly, and we measured several QoS parameters, comparing our detector against known implementations. The performance data collected indicate that neural networks and MIB variables can indeed be combined to improve the QoS of failure detectors.