The MAFT Architecture for Distributed Fault Tolerance

Authors:
Roger M. Kieckhafer;Chris J. Walter;Alan M. Finn;Philip M. Thambidurai
Affiliations:
Univ. of Nebraska, Lincoln;-;-;Duke Univ., Durham, NC
Venue:
IEEE Transactions on Computers - Fault-Tolerant Computing
Year:
1988

Citing 4
Cited 31

Optimal clock synchronization

Journal of the ACM (JACM)
Production and Stabilization of Real-Time Task Schedules

Journal of the ACM (JACM)
Reaching Agreement in the Presence of Faults

Journal of the ACM (JACM)
The Byzantine Generals Problem

ACM Transactions on Programming Languages and Systems (TOPLAS)

Formal verification of algorithms for critical systems

SIGSOFT '91 Proceedings of the conference on Software for citical systems
An annotated bibliography of dependable distributed computing

ACM SIGOPS Operating Systems Review
Task Allocation for Maximizing Reliability of Distributed Computer Systems

IEEE Transactions on Computers
Traffic Routing for Multicomputer Networks with Virtual Cut-Through Capability

IEEE Transactions on Computers
A formally verified algorithm for clock synchronization under a hybrid fault model

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Formal Verification for Fault-Tolerant Architectures: Prolegomena to the Design of PVS

IEEE Transactions on Software Engineering
New Hybrid Fault Models for Asynchronous Approximate Agreement

IEEE Transactions on Computers
Implementing Fail-Silent Nodes for Distributed Systems

IEEE Transactions on Computers
Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Formally Verified On-Line Diagnosis

IEEE Transactions on Software Engineering
Stability and Performance of List Scheduling With ExternalProcess Delays

Real-Time Systems
Overload Management in Real-Time Control Applications Using m,k $(m,k)$-Firm Guarantee

IEEE Transactions on Parallel and Distributed Systems
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems

IEEE Transactions on Parallel and Distributed Systems
Replica Determinism and Flexible Scheduling in Hard Real-Time Dependable Systems

IEEE Transactions on Computers
Exploiting Omissive Faults in Synchronous Approximate Agreement

IEEE Transactions on Computers
Inherently Stable Real-Time Priority List Dispatchers

IEEE Parallel & Distributed Technology: Systems & Technology
Reaching Approximate Agreement with Mixed-Mode Faults

IEEE Transactions on Parallel and Distributed Systems
Formal Verification of Algorithms for Critical Systems

IEEE Transactions on Software Engineering
The customizable fault/error model for dependable distributed systems

Theoretical Computer Science - Dependable computing
Transparent Environment for Replicated Ravenscar Applications

Ada-Europe '02 Proceedings of the 7th Ada-Europe International Conference on Reliable Software Technologies
Reducing Critical Failures for Control Algorithms Using Executable Assertions and Best Effort Recover

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Reconfiguration and transient recovery in state machine architectures

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Efficient NMRCD scheme for fault tolerance in responsive systems

RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Specialized N-modular redundant processors in large-scale distributed systems

SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems
A Consensus Protocol for CAN-Based Systems

RTSS '03 Proceedings of the 24th IEEE International Real-Time Systems Symposium
Replication Management in Reliable Real-Time Systems

Real-Time Systems
Scheduling dynamic graphs

STACS'99 Proceedings of the 16th annual conference on Theoretical aspects of computer science
Fault-models in wireless communication: towards survivable ad hoc networks

MILCOM'06 Proceedings of the 2006 IEEE conference on Military communications
Scheduling fixed-priority hard real-time tasks in the presence of faults

LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
A framework for ensuring and improving dependability in highly distributed systems

Architecting Dependable Systems III
A decentralized redeployment algorithm for improving the availability of distributed systems

CD'05 Proceedings of the Third international working conference on Component Deployment

Quantified Score

Hi-index	0.04

Visualization

Abstract

A description is given of the multicomputer architecture for fault tolerance (MAFT), a distributed system designed to provide extremely reliable computation in real-time control systems. MAFT is based on the physical and functional partitioning of executive functions from applications functions. The implementation of the executive functions in a special-purpose hardware processor allows the fault-tolerance functions to be transparent to the application programs and minimizes overhead. Byzantine agreement and approximate agreement algorithms are used for critical system parameters. MAFT supports the use of multiversion hardware and software to tolerate built-in or generic faults. Graceful degradation and restoration of the application workload is permitted in response to the exclusion and readmission of nodes, respectively.