High-Availability Computer Systems

  • Authors:
  • Jim Gray;Daniel P. Siewiorek

  • Affiliations:
  • -;-

  • Venue:
  • Computer
  • Year:
  • 1991

Quantified Score

Hi-index 4.11

Visualization

Abstract

The techniques used to build highly available computer systems are sketched. Historical background is provided, and terminology is defined. Empirical experience with computer failure is briefly discussed. Device improvements that have greatly increased the reliability of digital electronics are identified. Fault-tolerant design concepts and approaches to fault-tolerant hardware are outlined. The role of repair and maintenance and of design-fault tolerance is discussed. Software repair is considered. The use of pairs of computer systems at separate locations to guard against unscheduled outages due to outside sources (communication or power failures, earthquakes, etc.) is addressed.