Linearizability: a correctness condition for concurrent objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Using process groups to implement failure detection in asynchronous environments
PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Lightweight causal and atomic group multicast
ACM Transactions on Computer Systems (TOCS)
The process group approach to reliable distributed computing
Communications of the ACM
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
Replication management using the state-machine approach
Distributed systems (2nd Ed.)
Distributed systems (2nd Ed.)
Consensus service: a modular approach for building agreement protocols in distributed systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
System support for object groups
Proceedings of the 13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Multi-μ: an Ada 95 based architecture for fault tolerance support of real-time systems
Proceedings of the 1998 annual ACM SIGAda international conference on Ada
The Hector Distributed Run-Time Environment
IEEE Transactions on Parallel and Distributed Systems
An open framework for reliable distributed computing
ACM Computing Surveys (CSUR)
IEEE Transactions on Software Engineering
Group communication specifications: a comprehensive study
ACM Computing Surveys (CSUR)
Moshe: A group membership service for WANs
ACM Transactions on Computer Systems (TOCS)
Garf: A Tool for Programming Reliable Distributed Applications
IEEE Parallel & Distributed Technology: Systems & Technology
The Database State Machine Approach
Distributed and Parallel Databases
Abstracting Services in a Heterogeneous Environment
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Quorum-Based Replication in Asynchronous Crash-Recovery Distributed Systems (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Fault-Tolerant Sequencer for Timed Asynchronous Systems
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
OpenCorba: A Reflective Open Broker
Reflection '99 Proceedings of the Second International Conference on Meta-Level Architectures and Reflection
Using Agent Replication to Enhance Reliability and Availability of Multi-agent Systems
AI '02 Proceedings of the 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Integrating Group Communication with Transactions for Implementing Persistent Replicated Objects
Advances in Distributed Systems, Advanced Distributed Computing: From Algorithms to Systems
Advances in Distributed Systems, Advanced Distributed Computing: From Algorithms to Systems
Programming Partition-Aware Network Applications
Advances in Distributed Systems, Advanced Distributed Computing: From Algorithms to Systems
Improving Scalability of Replicated Services in Mobile Agent Systems
MA '02 Proceedings of the 6th International Conference on Mobile Agents
A Dynamic Replica Selection Algorithm for Tolerating Timing Faults
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
On the Provision of Replicated Internet Auction Services
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Three-tier replication for FT-CORBA infrastructures
Software—Practice & Experience
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
IEEE Transactions on Knowledge and Data Engineering
Reliable Distributed Network Management by Replication
Journal of Network and Systems Management
Reliable Peer-to-Peer End System Multicasting through Replication
P2P '04 Proceedings of the Fourth International Conference on Peer-to-Peer Computing
Implementing a replicated service with group communication
Journal of Systems Architecture: the EUROMICRO Journal
Fault tolerant algorithm based on dynamic and active load balancing for redundant services
Journal of Computer Science and Technology
Dynamic data replication and consistency in mobile environments
DSM '05 Proceedings of the 2nd international doctoral symposium on Middleware
Experience and prospects for various control strategies for self-replicating multi-agent systems
Proceedings of the 2006 international workshop on Self-adaptation and self-managing systems
DimaX: a fault-tolerant multi-agent platform
Proceedings of the 2006 international workshop on Software engineering for large-scale multi-agent systems
On fault tolerance in law-governed multi-agent systems
Proceedings of the 2006 international workshop on Software engineering for large-scale multi-agent systems
Journal of Parallel and Distributed Computing
From spontaneous total order to uniform total order: different degrees of optimistic delivery
Proceedings of the 2006 ACM symposium on Applied computing
Revisiting 1-copy equivalence in clustered databases
Proceedings of the 2006 ACM symposium on Applied computing
Fully Distributed Three-Tier Active Software Replication
IEEE Transactions on Parallel and Distributed Systems
A Predictive Method for Providing Fault Tolerance in Multi-agent Systems
IAT '06 Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology
A software engineering approach for the development of heterogeneous robotic applications
Robotics and Computer-Integrated Manufacturing
The co-replication methodology and its application to structured parallel programs
Proceedings of the 2007 symposium on Component and framework technology in high-performance and scientific computing
A survey of linguistic structures for application-level fault tolerance
ACM Computing Surveys (CSUR)
Data and code integrity in Grid environments
SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
On Fault Tolerance in Law-Governed Multi-agent Systems
Software Engineering for Multi-Agent Systems V
Annotation Markers for Runtime Replication Protocol Selection
ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Model-Driven Adaptive Self-healing for Autonomic Computing
MACE '08 Proceedings of the 3rd IEEE international workshop on Modelling Autonomic Communications Environments
Computing the fault tolerance of multi-agent deployment
Artificial Intelligence
DTR: Distributed Transaction Routing in a Large Scale Network
High Performance Computing for Computational Science - VECPAR 2008
A step towards a new generation of group communication systems
Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
A Redundancy Protocol for Service-Oriented Architectures
Service-Oriented Computing --- ICSOC 2008 Workshops
Characterizing fault tolerance in genetic programming
BADS '09 Proceedings of the 2009 workshop on Bio-inspired algorithms for distributed systems
FT-OSGi: Fault Tolerant Extensions to the OSGi Service Platform
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part I
Database replication in large scale systems: optimizing the number of replicas
Proceedings of the 2009 EDBT/ICDT Workshops
Semi-passive replication and Lazy Consensus
Journal of Parallel and Distributed Computing
Low-cost fault-tolerance protocol for large-scale network monitoring
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
DARX: a self-healing framework for agents
Proceedings of the 12th Monterey conference on Reliable systems on unreliable networked platforms
Design and performance of a generic consensus component for critical distributed applications
Ada-Europe'07 Proceedings of the 12th international conference on Reliable software technologies
Characterizing fault tolerance in genetic programming
Future Generation Computer Systems
Exploiting commutativity for efficient replication in partitionable distributed systems
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
Reviewing amnesia support in database recovery protocols
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
Towards reliable multi-agent systems: An adaptive replication mechanism
Multiagent and Grid Systems
Dynamic and adaptive replication for large-scale reliable multi-agent systems
Software engineering for large-scale multi-agent systems
Dynamic service quality and resource negotiation for high-availability service-oriented systems
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Best-effort group service in dynamic networks
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Distributed and fault-tolerant execution framework for transaction processing
Proceedings of the 4th Annual International Conference on Systems and Storage
Rectifying orphan components using group-failover in distributed real-time and embedded systems
Proceedings of the 14th international ACM Sigsoft symposium on Component based software engineering
Separating computation and storage with storage virtualization
Computer Communications
Adaptive Replication in Fault-Tolerant Multi-agent Systems
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Run-time switching between total order algorithms
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Towards a generic group communication service
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
Group-Based replication of on-line transaction processing servers
LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Replication techniques for availability
Replication
Increasing availability in a replicated partitionable distributed object system
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Detecting and tolerating failures in a loosely integrated heterogeneous database system
Computer Communications
H: A component-based specification language for heterogeneous applications
Computer Standards & Interfaces
Fault-tolerant fault tolerance for component-based automation systems
Proceedings of the 4th international ACM Sigsoft symposium on Architecting critical systems
Representing dynamic pluggable software units
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Self-stabilizing iterative solvers
ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Hi-index | 4.10 |
Developers of early distributed systems took a simplistic approach to providing fault tolerance: They just used another copy of the same hardware as a backup. Later, others developed replication software to work on off-the-shelf hardware. Since neither of these methods is especially economical, a logical course is to take it one step further and eliminate the extra hardware altogether. Fully software-based replication relies on sophisticated techniques to keep track of server communications and ensure the consistency of information across several server replicas. How do you know that each server shares the same view of the data or program semantics? What happens if a server replica crashes? How do you make sure that a system processes invocations in the correct order? These are all problems that a replication technique has to handle. The authors describe two fundamental techniques, primary-backup and active replication, and illustrate how they handle these problems. At this point, both have advantages and disadvantages that depend on the application. The authors also propose that group communication provides a sufficient framework for implementing software-based replication. The concept of static and dynamic groups proves useful in thinking about how to implement replication techniques. Replication techniques can also use total-order and view-synchronous multicast primitives from group communication.