Distributed Fault Tolerance: Lessons from Delta-4

Authors:
David Powell
Affiliations:
-
Venue:
IEEE Micro
Year:
1994

Citing 10
Cited 22

Replicated procedure call

ACM SIGOPS Operating Systems Review
AMp: a highly parallel atomic multicast protocol

SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
Distributed systems

Distributed systems
Exploiting replication in distributed systems

Distributed systems
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
A theoretician's view of fault tolerant distributed computing

Fault-tolerant distributed computing
Fault-tolerance in the advanced automation system

EW 4 Proceedings of the 4th workshop on ACM SIGOPS European workshop
Dependability: Basic Concepts and Terminology

Dependability: Basic Concepts and Terminology
Delta Four: A Generic Architecture for Dependable Distributed Computing

Delta Four: A Generic Architecture for Dependable Distributed Computing
Reliable Multicast between Micro-Kernels

Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures

From group communication to transactions in distributed systems

Communications of the ACM
Chameleon: A Software Infrastructure for Adaptive Fault Tolerance

IEEE Transactions on Parallel and Distributed Systems
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems

IEEE Transactions on Parallel and Distributed Systems
Hierarchical Error Detection in a Software Implemented Fault Tolerance (SIFT) Environment

IEEE Transactions on Knowledge and Data Engineering
A Scalable Fault-Tolerant Network Management System Built Using Distributed Object Technology

EDOC '97 Proceedings of the 1st International Conference on Enterprise Distributed Object Computing
Comparing Fail-Sailence Provided by Process Duplication versus Internal Error Detection for DHCP Server

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
From Experimental Assessment of Fault-Tolerant Systems to Dependability Benchmarking

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Process Migration Subsystem for a Workstation-Based Distributed Systems

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Specialized N-modular redundant processors in large-scale distributed systems

SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems
Comparison of Physical and Software-Implemented Fault Injection Techniques

IEEE Transactions on Computers
A Comprehensive Model for Software Rejuvenation

IEEE Transactions on Dependable and Secure Computing
Dependability through Assured Reconfiguration in Embedded System Software

IEEE Transactions on Dependable and Secure Computing
FLARe: a Fault-tolerant Lightweight Adaptive Real-time middleware for distributed real-time and embedded systems

Proceedings of the 4th on Middleware doctoral symposium
Jgroup-ARM: a distributed object group platform with autonomous replication management

Software—Practice & Experience
Three reliability engineering techniques and their application to evaluating the availability of it systems: an introduction

IBM Systems Journal
FT-OSGi: Fault Tolerant Extensions to the OSGi Service Platform

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part I
Towards middleware for fault-tolerance in distributed real-time and embedded systems

DAIS'08 Proceedings of the 8th IFIP WG 6.1 international conference on Distributed applications and interoperable systems
Supporting component-based failover units in middleware for distributed real-time and embedded systems

Journal of Systems Architecture: the EUROMICRO Journal
An approach to experimentally obtain service dependability characteristics of the Jgroup/ARM system

EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Architecting web services applications for improving availability

Architecting Dependable Systems III
Processing of confidential information in distributed systems by fragmentation1This work has been partially supported by the ESPRIT Basic Research Action no.6362, PDCS2 (Predictably Dependable Computing Systems). 1

Computer Communications
An architecture for self-healing autonomous object groups

ATC'07 Proceedings of the 4th international conference on Autonomic and Trusted Computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Because they avoid extensive redesign of specialized hardware, software-implemented approaches to fault tolerance are very resilient to change. Europe's Delta-4 project argues persuasively for implementing fault tolerance in a distributed fashion. The Delta-4 approach achieves fault tolerance by replicating capsules/spl minus/runtime representations of application objects/spl minus/on distributed, LAN-interconnected nodes. It can configure capsule groups to tolerate either stopping or arbitrary failures. Its multipoint protocols serve to coordinate capsule groups and for error processing and fault treatment.