Transparent fault tolerance for distributed Ada applications

  • Authors:
  • Mark A. Breland;Steven A. Rogers;Guillaume P. Brat;Kenneth L. Nelson

  • Affiliations:
  • Microelectronics and Computer Technology Corporation (MCC), 3500 West Balcones Center Drive, Austin, Texas;Microelectronics and Computer Technology Corporation (MCC), 3500 West Balcones Center Drive, Austin, Texas;Department of Electrical and Computer, Engineering, The University of Texas at Austin, Austin, Texas;Computing Devices International, 8800 Queen Avenue South, Bloomington, Minnesota

  • Venue:
  • TRI-Ada '94 Proceedings of the conference on TRI-Ada '94
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

The advent of open architectures and initiatives in massively parallel supercomputing, combined with the maturation of distributed processing methods and algorithms, has enabled the implementation of responsive software-based fault tolerance. Expanding capabilities of distributed Ada runtime environments further stimulate the incorporation of hardware fault tolerance into critical, realtime embedded systems. Through the integration of proven Ada program component distribution and virtually synchronous communication protocols, we have established a benchmark fault tolerant system, which layers transparently between an Ada application and the runtime environment. Such transparence allows rapid reconfiguration of distribution and fault tolerance characteristics without change to the source code, thus enhancing portability, scalability, and reuse.The Ada Fault Tolerance project has implemented software technologies which penetrate the envelope of an Ada program to detect, diagnose, and recover from hardware faults. These realtime facilities interact with the Rational distributed application development and runtime environment systems to service replicated Ada software tasks (i.e., threads of control). The deployed system proves that all replicated threads, including those of independently distributed components, can achieve timely consensus during periodic fault detection cycles through transparently embedded voting protocols. Our implementation uses a hybrid redundancy computation strategy and relies on a communication layer which provides virtual synchrony via a causal multicast protocol.