Exploring event correlation for failure prediction in coalitions of clusters
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Failure detectors for wireless sensor-actuator systems
Ad Hoc Networks
Quantifying event correlations for proactive failure management in networked computing systems
Journal of Parallel and Distributed Computing
A failure detector for wireless networks with unknown membership
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
What model and what conditions to implement unreliable failure detectors in dynamic networks?
Proceedings of the 3rd International Workshop on Theoretical Aspects of Dynamic Distributed Systems
Cross-layer cluster-based data dissemination for failure detection in MANETs
Proceedings of the 7th International Conference on Network and Services Management
Hi-index | 0.00 |
A failure detector is an important building block when constructing fault-tolerant distributed systems. In asynchronous distributed systems, failed processes are often indistinguishable from slow processes. A failure detector is an oracle that can intelligently suspect processes to have failed. Different classes of failure detectors have been proposed to solve different kinds of problems. Almost all of this work is focussed on global failure detection, and moreover, in systems that do not contain mobile nodes or include dynamic topologies. In this paper, we present 3Pma local failure detector that can tolerate mobility and topology changes. This means that 3Pmcan distinguish between a failed process and a process that has moved away from its original location. We also establish an upper bound on the duration for which a process wrongly suspects a node that has moved away from its neighborhood. We support our theoretical results with experimental findings from an implementation of this algorithm for sensor networks.