Reliability Analysis in Distributed Systems

  • Authors:
  • C. S. Raghavendra;V. K. P. Kumar;S. Hariri

  • Affiliations:
  • -;-;-

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 1988

Quantified Score

Hi-index 14.98

Visualization

Abstract

Reliability of a distributed processing system is an important design parameter that can be described in terms of the reliability of processing elements and communication links and also of the redundancy of programs and data files. The traditional terminal-pair reliability does not capture the redundancy of programs and files in a distributed system. Two reliability measures are introduced: distributed program reliability, which describes the probability of successful execution of a program requiring cooperation of several computers, and distributed system reliability, which is the probability that all the specified distributed programs for the system are operational. These two reliability measures can be extended to incorporate the effects of user sites on reliability. An efficient approach based on graph traversal is developed to evaluate the proposed reliability measures.