Congestion avoidance and control
SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
Understanding fault-tolerant distributed systems
Communications of the ACM
Discrete-time control systems (2nd ed.)
Discrete-time control systems (2nd ed.)
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
Replication and fault-tolerance in the ISIS system
Proceedings of the tenth ACM symposium on Operating systems principles
On the Quality of Service of Failure Detectors
IEEE Transactions on Computers
An autonomic failure-detection algorithm
WOSP '04 Proceedings of the 4th international workshop on Software and performance
Feedback Control of Computing Systems
Feedback Control of Computing Systems
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Experimental Evaluation of the QoS of Failure Detectors on Wide Area Network
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
On the Quality of Service of Failure Detectors Based on Control Theory
AINA '06 Proceedings of the 20th International Conference on Advanced Information Networking and Applications - Volume 01
Latency and bandwidth-minimizing failure detectors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
A Lazy Monitoring Approach for Heartbeat-Style Failure Detectors
ARES '08 Proceedings of the 2008 Third International Conference on Availability, Reliability and Security
A non-intrusive component-based approach for deploying unanticipated self-management behaviour
SEAMS '09 Proceedings of the 2009 ICSE Workshop on Software Engineering for Adaptive and Self-Managing Systems
LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Timeout-based adaptive consensus: improving performance through adaptation
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Adaptive request batching for byzantine replication
ACM SIGOPS Operating Systems Review
Enhancing group communication with self-manageable behavior
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Failure detectors are basic building blocks from which fault tolerance for distributed systems is constructed. The Quality of Service (QoS) of failure detectors refers to the speed and accuracy of detections and is defined from the applications and computing environment under consideration. Existing failure detection approaches for distributed systems do not support the automatic (re)configuration of failure detectors from QoS requirements. However, when the behavior of the computing environment is unknown and changes over time, or when the application itself changes, self-configuration is a basic issue that must be addressed - particularly for those applications requiring response time and high availability requirements. In this paper we present the design and implementation of a novel autonomic failure detector based on feedback control theory, which is capable of self-configuring its QoS parameters at runtime from previously specified QoS requirements.