Experimental Evaluation of Failure-Detection Schemes in Real-time Communication Networks

  • Authors:
  • Seungjae Han;Kang G. Shin

  • Affiliations:
  • -;-

  • Venue:
  • FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

An effective failure-detection scheme is essential for reliable communication services. Most computer networks rely on behavior-based detection schemes: each node uses heartbeats to detect the failure of its neighbor nodes, and the transport protocol (like TCP) achieves reliable communication by acknowledgment/retransmission. In this paper, we experimentally evaluate the effectiveness of such behavior-based detection schemes in real-time communication. Specifically, we measure and analyze the coverage and latency of two failure-detection schemes --- neighbor detection and end-to-end detection --- through fault-injection experiments. The experimental results have shown that a significant portion of failures can be detected very quickly by the neighbor detection scheme, while the end-to-end detection scheme uncovers the remaining failures with larger detection latencies.