Maintaining availability in partitioned replicated databases
ACM Transactions on Database Systems (TODS)
Multiparty Interactions for Interprocess Communication and Synchronization
IEEE Transactions on Software Engineering
Coda: A Highly Available File System for a Distributed Workstation Environment
IEEE Transactions on Computers
Unreliable failure detectors for asynchronous systems (preliminary version)
PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Increasing the resilience of atomic commit, at no additional cost
PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Determining the last process to fail
ACM Transactions on Computer Systems (TOCS)
Horus: a flexible group communication system
Communications of the ACM
An efficient, fault-tolerant protocol for replicated data management
PODS '85 Proceedings of the fourth ACM SIGACT-SIGMOD symposium on Principles of database systems
End-to-end arguments in system design
ACM Transactions on Computer Systems (TOCS)
Reliable Distributed Computing with the ISIS Toolkit
Reliable Distributed Computing with the ISIS Toolkit
Primary Partition "Virtually-Synchronous Communication" harder than Consensus
WDAG '94 Proceedings of the 8th International Workshop on Distributed Algorithms
Replicated File Management in Large-Scale Distributed Systems
WDAG '94 Proceedings of the 8th International Workshop on Distributed Algorithms
Group, majority, and strict agreement in timed asynchronous distributed systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
An evaluation of the Amoeba group communication system
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Group Membership and View Synchrony in Partitionable Asynchronous Distributed Systems: Specifications
Group-based multicast and dynamic membership in wireless networks with incomplete spatial coverage
Mobile Networks and Applications - Special issue on protocols and software paradigms of mobile networks
Middleware for dependable network services in partitionable distributed systems
ACM SIGOPS Operating Systems Review
Enhancing Replica Management Services to Cope with Group Failures
Advances in Distributed Systems, Advanced Distributed Computing: From Algorithms to Systems
Programming Partition-Aware Network Applications
Advances in Distributed Systems, Advanced Distributed Computing: From Algorithms to Systems
Online Reconfiguration in Replicated Databases Based on Group Communication
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Group Multicast in Distributed Mobile Systems with Unreliable Wireless Network
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Group Membership and Wide-Area Master-Worker Computations
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Application-based dynamic primary views in asynchronous distributed systems
Journal of Parallel and Distributed Computing
Implementing a replicated service with group communication
Journal of Systems Architecture: the EUROMICRO Journal
Supporting amnesia in log-based recovery protocols
EATIS '07 Proceedings of the 2007 Euro American conference on Telematics and information systems
Formalising reconciliation in partitionable networks with distributed services
Rigorous Development of Complex Fault-Tolerant Systems
Recovery strategies for linear replication
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Hi-index | 14.98 |
Distributed systems constructed using off-the-shelf communication infrastructures are becoming common vehicles for doing business in many important application domains. Large geographic extent due to increased globalization, increased probability of failures, and highly dynamic loads all contribute toward a partitionable and asynchronous characterization for these systems. In this paper, we consider the problem of developing reliable applications to be deployed in partitionable asynchronous distributed systems. What makes this task difficult is guaranteeing the consistency of shared state despite asynchrony, failures, and recoveries, including the formation and merging of partitions. While view synchrony within process groups is a powerful paradigm that can significantly simplify reasoning about asynchrony and failures, it is insufficient for coping with recoveries and merging of partitions after repairs. We first give an abstract characterization for shared state management in partitionable asynchronous distributed systems and then show how views can be enriched to convey structural and historical information relevant to the group's activity. The resulting paradigm, called enriched view synchrony, can be implemented efficiently and leads to a simple programming methodology for solving shared state management in the presence of partitions.