The MAFT Architecture for Distributed Fault Tolerance

  • Authors:
  • Roger M. Kieckhafer;Chris J. Walter;Alan M. Finn;Philip M. Thambidurai

  • Affiliations:
  • Univ. of Nebraska, Lincoln;-;-;Duke Univ., Durham, NC

  • Venue:
  • IEEE Transactions on Computers - Fault-Tolerant Computing
  • Year:
  • 1988

Quantified Score

Hi-index 0.04

Visualization

Abstract

A description is given of the multicomputer architecture for fault tolerance (MAFT), a distributed system designed to provide extremely reliable computation in real-time control systems. MAFT is based on the physical and functional partitioning of executive functions from applications functions. The implementation of the executive functions in a special-purpose hardware processor allows the fault-tolerance functions to be transparent to the application programs and minimizes overhead. Byzantine agreement and approximate agreement algorithms are used for critical system parameters. MAFT supports the use of multiversion hardware and software to tolerate built-in or generic faults. Graceful degradation and restoration of the application workload is permitted in response to the exclusion and readmission of nodes, respectively.