How to assign votes in a distributed system
Journal of the ACM (JACM)
The vulnerability of vote assignments
ACM Transactions on Computer Systems (TOCS)
Hierarchical Quorum Consensus: A New Algorithm for Managing Replicated Data
IEEE Transactions on Computers
A N algorithm for mutual exclusion in decentralized systems
ACM Transactions on Computer Systems (TOCS)
Crumbling walls: a class of practical and efficient quorum systems
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
The Load, Capacity, and Availability of Quorum Systems
SIAM Journal on Computing
ACM Transactions on Computer Systems (TOCS)
Optimal availability quorum systems: theory and practice
Information Processing Letters
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Generating and Approximating Nondominated Coteries
IEEE Transactions on Parallel and Distributed Systems
Evaluating quorum systems over the Internet
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Weighted voting for replicated data
SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
Synchronous Consensus for Dependent Process Failures
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
A Performance Evaluation of a Quorum-Based State-Machine Replication Algorithm For Computing Grids
SBAC-PAD '04 Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing
Surviving internet catastrophes
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
The virtue of dependent failures in multi-site systems
HotDep'05 Proceedings of the First conference on Hot topics in system dependability
Replication predicates for dependent-failure algorithms
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
OPODIS'04 Proceedings of the 8th international conference on Principles of Distributed Systems
Classic Paxos vs. fast Paxos: caveat emptor
HotDep'07 Proceedings of the 3rd workshop on on Hot Topics in System Dependability
The obscure nature of epidemic quorum systems
Proceedings of the 9th workshop on Mobile computing systems and applications
Data-aware connectivity in mobile replicated systems
Proceedings of the Eighth ACM International Workshop on Data Engineering for Wireless and Mobile Access
Hi-index | 0.00 |
In this paper, we explore new failure models for multi-site systems, which are systems characterized by a collection of sites spread across a wide area network, each site formed by a set of computing nodes running processes. In particular, we introduce two failure models that allow sites to fail, and we use them to derive coteries. We argue that these coteries have better availability than quorums formed by a majority of processes, which are known for having best availability when process failures are independent and identically distributed. To motivate introducing site failures explicitly into a failure model, we present availability data from a production multi-site system, showing that sites are frequently unavailable. We then discuss the implementability of our abstract models, showing possibilities for obtaining these models in practice. Finally, we present evaluation results from running an implementation of the Paxos algorithm on PlanetLab using different quorum constructions. The results show that our constructions have substantially better availability and response time compared to majority coteries.