Experimental Evaluation of Behavior-Based Failure-Detection Schemes in Real-Time Communication Networks

Authors:
Seungjae Han;Kang G. Shin
Affiliations:
The Univ. of Michigan, Ann Arbor;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1999

Citing 15
Cited 1

The x-kernel: A Platform for Accessing Internet Resources

Computer
Delivery of time-critical messages using a multiple copy approach

ACM Transactions on Computer Systems (TOCS)
Internetworking with TCP/IP (2nd ed.), vol. I

Internetworking with TCP/IP (2nd ed.), vol. I
Reliable computer systems (2nd ed.): design and evaluation

Reliable computer systems (2nd ed.): design and evaluation
FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior Under Faults

IEEE Transactions on Software Engineering - Special issue on software reliability
Simulation study of the capacity effects of dispersity routing for fault tolerant realtime channels

Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Dependability Analysis of a High-Speed Network Using Software-Implemented Fault Injection and Simulated Fault Injection

IEEE Transactions on Computers
Fault Injection Techniques and Tools

Computer
Real-Time Communication in Multihop Networks

IEEE Transactions on Parallel and Distributed Systems
Experimental assessment of parallel systems

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Experimental evaluation of the fail-silent behaviour in programs with consistency checks

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
How Fail-Stop are Faulty Programs?

FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Resource management for real-time communication: making theory meet practice

RTAS '96 Proceedings of the 2nd IEEE Real-Time Technology and Applications Symposium (RTAS '96)
DOCTOR: an integrated software fault injection environment for distributed real-time systems

IPDS '95 Proceedings of the International Computer Performance and Dependability Symposium on Computer Performance and Dependability Symposium
Exploring quality-of-service issues in network interface design

Exploring quality-of-service issues in network interface design

Evaluation of a New Resource Reservation Scheme for MPEG Transmission in Highly Available Real-Time Schannels

INTERWORKING '00 Proceedings of the 5th IFIP TC6 International Symposium on Next Generation Networks, Networks and Services for the Information Society

Quantified Score

Hi-index	0.00

Visualization

Abstract

Effective detection of failures is essential for reliable communication services. Traditionally, non-real-time computer networks have relied on behavior-based techniques for detecting communication failures. That is, each node uses heartbeats to detect the failure of its neighbors and the end-to-end transport protocol (e.g., TCP) achieves reliable communication by acknowledgment/retransmission. Recently, there has been a growing demand for reliable 驴real-time驴 communication, but little research has been done on the failure detection problem. In this paper, we present two behavior-based failure-detection schemes驴neighbor detection and end-to-end detection驴for reliable real-time communication services and experimentally evaluate their effectiveness. Specifically, we measure and analyze the coverage and latency of these detection schemes through fault-injection experiments. The experimental results have shown that nearly all failures can be detected very quickly by the neighbor detection scheme, while the end-to-end detection scheme uncovers the remaining failures with larger detection latencies.