Incorporation of Optimal Timeouts into Distributed Real-Time Load Sharing

Authors:
Chao-Ju Hou;K. G. Shin
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
1994

Citing 15
Cited 0

Simulations of three adaptive, decentralized controlled, job scheduling algorithms

Computer Networks and ISDN Systems
A distributed load-balancing policy for a multicomputer

Software—Practice & Experience
Calculating Cumulative Operational Time Distributions of Repairable Computer Systems

IEEE Transactions on Computers - The MIT Press scientific computation series
Reliable Broadcast in Hypercube Multicomputers

IEEE Transactions on Computers
Calculating availability and performability measures of repairable computer systems using randomization

Journal of the ACM (JACM)
Applied multivariate statistical analysis

Applied multivariate statistical analysis
Distributed Scheduling of Tasks with Deadlines and Resource Requirements

IEEE Transactions on Computers
Load Sharing in Distributed Real-Time Systems with State-Change Broadcasts

IEEE Transactions on Computers
Analysis of the Effects of Delays on Load Sharing

IEEE Transactions on Computers
A Performance Analysis of Minimum Laxity and Earliest Deadline Scheduling in a Real-Time System

IEEE Transactions on Computers
Hardware-Assisted Software Clock Synchronization for Homogeneous Distributed Systems

IEEE Transactions on Computers
Reliable broadcast algorithms for HARTS

ACM Transactions on Computer Systems (TOCS)
An introduction to signal detection and estimation (2nd ed.)

An introduction to signal detection and estimation (2nd ed.)
Design and Evaluation of Effective Load Sharing in Distributed Real-Time Systems

IEEE Transactions on Parallel and Distributed Systems
Load balancing in homogeneous broadcast distributed systems

Proceedings of the Computer Network Performance Symposium

Quantified Score

Hi-index	14.99

Visualization

Abstract

Consideration is given to the problem of designing and incorporating a timeout mechanism into load sharing (LS) with state-region change broadcasts in the presence of node failures in a distributed real-time system. Failure of a node is diagnosed by the other nodes through communication timeouts, and the timeout period used to diagnose whether a node is faulty or not usually depends on the dynamic changes in system load, the task attributes at the node, and the state the node was initially in. We formulate the problem of determining the "best" timeout period T/sub out//sup (i)/ for node i as a hypothesis testing problem, and maximize the probability of detecting node failures subject to a pre-specified probability of falsely diagnosing a healthy node as faulty. The parameters needed for the calculation of T/sub out//sup (i)/ are estimated online by node i using the Bayesian technique and are piggy-backed in its region-change broadcasts. The broadcast information is then used to determine T/sub out//sup (i)/. If node n has not heard from node i for T/sub out//sup (i)/ since its receipt of the latest broadcast from node i, it will consider node i failed, and will not consider any task transfer to node i until it receives a broadcast message from node i again. On the other hand, to further reduce the probability of incorrect diagnosis, each node n also determines its own timeout period T/sub out//sup (n)/, and broadcasts its state not only at the time of state-region changes but also when it has remained within a broadcast interval throughout T/sub out//sup (n)/.