ACM SIGOPS Operating Systems Review
AMp: a highly parallel atomic multicast protocol
SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
Distributed systems
Exploiting replication in distributed systems
Distributed systems
Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
A theoretician's view of fault tolerant distributed computing
Fault-tolerant distributed computing
Fault-tolerance in the advanced automation system
EW 4 Proceedings of the 4th workshop on ACM SIGOPS European workshop
Dependability: Basic Concepts and Terminology
Dependability: Basic Concepts and Terminology
Delta Four: A Generic Architecture for Dependable Distributed Computing
Delta Four: A Generic Architecture for Dependable Distributed Computing
Reliable Multicast between Micro-Kernels
Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
From group communication to transactions in distributed systems
Communications of the ACM
Chameleon: A Software Infrastructure for Adaptive Fault Tolerance
IEEE Transactions on Parallel and Distributed Systems
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems
IEEE Transactions on Parallel and Distributed Systems
Hierarchical Error Detection in a Software Implemented Fault Tolerance (SIFT) Environment
IEEE Transactions on Knowledge and Data Engineering
A Scalable Fault-Tolerant Network Management System Built Using Distributed Object Technology
EDOC '97 Proceedings of the 1st International Conference on Enterprise Distributed Object Computing
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
From Experimental Assessment of Fault-Tolerant Systems to Dependability Benchmarking
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Process Migration Subsystem for a Workstation-Based Distributed Systems
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Specialized N-modular redundant processors in large-scale distributed systems
SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems
Comparison of Physical and Software-Implemented Fault Injection Techniques
IEEE Transactions on Computers
A Comprehensive Model for Software Rejuvenation
IEEE Transactions on Dependable and Secure Computing
Dependability through Assured Reconfiguration in Embedded System Software
IEEE Transactions on Dependable and Secure Computing
Proceedings of the 4th on Middleware doctoral symposium
Jgroup-ARM: a distributed object group platform with autonomous replication management
Software—Practice & Experience
FT-OSGi: Fault Tolerant Extensions to the OSGi Service Platform
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part I
Towards middleware for fault-tolerance in distributed real-time and embedded systems
DAIS'08 Proceedings of the 8th IFIP WG 6.1 international conference on Distributed applications and interoperable systems
Journal of Systems Architecture: the EUROMICRO Journal
An approach to experimentally obtain service dependability characteristics of the Jgroup/ARM system
EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Architecting web services applications for improving availability
Architecting Dependable Systems III
An architecture for self-healing autonomous object groups
ATC'07 Proceedings of the 4th international conference on Autonomic and Trusted Computing
Hi-index | 0.02 |
Because they avoid extensive redesign of specialized hardware, software-implemented approaches to fault tolerance are very resilient to change. Europe's Delta-4 project argues persuasively for implementing fault tolerance in a distributed fashion. The Delta-4 approach achieves fault tolerance by replicating capsules/spl minus/runtime representations of application objects/spl minus/on distributed, LAN-interconnected nodes. It can configure capsule groups to tolerate either stopping or arbitrary failures. Its multipoint protocols serve to coordinate capsule groups and for error processing and fault treatment.