Consensus in the presence of partial synchrony
Journal of the ACM (JACM)
Congestion avoidance and control
SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
The weakest failure detector for solving consensus
Journal of the ACM (JACM)
Optimal implementation of the weakest failure detector for solving consensus (brief announcement)
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
On the Quality of Service of Failure Detectors
IEEE Transactions on Computers
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
he Timely Computing Base: Timely Actions in the Presence of Uncertain Timeliness
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Implementation and Performance Evaluation of an Adaptable Failure Detector
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A General Framework to Solve Agreement Problems
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
ADAPTATION - Algorithms to ADAPTive FAulT MonItOriNg and Their Implementation on CORBA
DOA '01 Proceedings of the Third International Symposium on Distributed Objects and Applications
A Hybrid and Adaptive Model for Fault-Tolerant Distributed Computing
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
IEEE Journal on Selected Areas in Communications - Special issue on wireless and pervasive communications for healthcare
QoS self-configuring failure detectors for distributed systems
DAIS'10 Proceedings of the 10th IFIP WG 6.1 international conference on Distributed Applications and Interoperable Systems
Hi-index | 0.00 |
A failure detector is an important building block for fault-tolerant distributed computing: mechanisms such as distributed consensus and group communication rely on the information provided by failure detectors in order to make progress and terminate. As such, erroneous information provided by the failure detector (or the absence of it) may delay decision-making or lead the upper-layer fault-tolerant mechanism to take incorrect decisions (e.g., the exclution of a correct process from a group membership). On the other hand, the implementation of failure detectors that can precisely identify failures is restricted by the actual behaviour of a system, especially in settings where message transmission delays and system loads can vary over time. In this paper we explore the use of artificial neural networks in order to implement failure detectors that are dynamically adapted to the current communication load conditions. The training patterns used to feed the neural network were obtained by using Simple Network Management Protocol (SNMP) agents over MIB – Management Information Base variables. The output of such neural network is an estimation for the arrival time for the failure detector to receive the next heartbeat message from a remote process. The suggested approach was fully implemented and tested over a set of GNU/Linux networked workstations. In order to analyze the efficiency of our approach, we have run a series of experiments where network loads were varied randomly, and we measured several QoS parameters, comparing our detector against known implementations. The performance data collected indicate that neural networks and MIB variables can indeed be combined to improve the QoS of failure detectors.