An Architecture for Survivable Coordination in Large Distributed Systems

Authors:
Dahlia Malkhi;Michael K. Reiter
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2000

Citing 31
Cited 31

Implementation of Argus

SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
Fast randomized consensus using shared memory

Journal of Algorithms
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Munin: distributed shared memory based on type-specific memory coherence

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Authentication in distributed systems: theory and practice

ACM Transactions on Computer Systems (TOCS)
Principal Features of the VOLTAN Family of Reliable Node Architectures for Distributed Systems

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Secure agreement protocols: reliable and atomic group multicast in rampart

CCS '94 Proceedings of the 2nd ACM Conference on Computer and communications security
How to share a function securely

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Sharing memory robustly in message-passing systems

Journal of the ACM (JACM)
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Safe and efficient sharing of persistent objects in Thor

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The load and availability of Byzantine quorum systems

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Probabilistic quorum systems

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Probabilistic Byzantine quorum systems

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Practical Byzantine fault tolerance

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
A Majority consensus approach to concurrency control for multiple copy databases

ACM Transactions on Database Systems (TODS)
The &OHgr; key management service

Journal of Computer Security
The Byzantine Generals Problem

ACM Transactions on Programming Languages and Systems (TOPLAS)
A method for obtaining digital signatures and public-key cryptosystems

Communications of the ACM
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Secure Distributed Storage and Retrieval

WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
Shared Generation of Authenticators and Signatures (Extended Abstract)

CRYPTO '91 Proceedings of the 11th Annual International Cryptology Conference on Advances in Cryptology
Fault Detection for Byzantine Quorum Systems

DCCA '99 Proceedings of the conference on Dependable Computing for Critical Applications
The SecureRing Protocols for Securing Group Communication

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences - Volume 3
Weighted voting for replicated data

SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
SESAME V2 public key and authorisation extensions to Kerberos

SNDSS '95 Proceedings of the 1995 Symposium on Network and Distributed System Security (SNDSS'95)
Survivable Consensus Objects

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Secure and Scalable Replication in Phalanx

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
On Diffusing Updates in a Byzantine Environment

SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Byzantine quorum systems

Distributed Computing

Fault Detection for Byzantine Quorum Systems

IEEE Transactions on Parallel and Distributed Systems
A Delay-Optimal Quorum-Based Mutual Exclusion Algorithm for Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
Scalable secure storage when half the system is faulty

Information and Computation
Practical byzantine fault tolerance and proactive recovery

ACM Transactions on Computer Systems (TOCS)
Active disk paxos with infinitely many processes

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Byzantine quorum systems with maximum availabililty

Information Processing Letters
Objects Shared by Byzantine Processes

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Scalable Secure Storage when Half the System Is Faulty

ICALP '00 Proceedings of the 27th International Colloquium on Automata, Languages and Programming
Distributing Trust on the Internet

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Diffusion without false rumors: on propagating updates in a Byzantine environment

Theoretical Computer Science
Backoff Protocols for Distributed Mutual Exclusion and Ordering

ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Byzantine disk paxos: optimal resilience with byzantine shared memory

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Objects shared by Byzantine processes

Distributed Computing
Fault-scalable Byzantine fault-tolerant services

Proceedings of the twentieth ACM symposium on Operating systems principles
Active disk Paxos with infinitely many processes

Distributed Computing - Special issue: PODC 02
How fast can a very robust read be?

Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Tight bounds for shared memory systems accessed by Byzantine processes

Distributed Computing - Special issue: DISC 03
Probabilistic quorums for dynamic systems

Distributed Computing - Special issue: DISC 03
Specifying and using intrusion masking models to process distributed operations

Journal of Computer Security
Attested append-only memory: making adversaries stick to their word

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
HQ replication: a hybrid quorum protocol for byzantine fault tolerance

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Pors: proofs of retrievability for large files

Proceedings of the 14th ACM conference on Computer and communications security
Middleware for semantic-based security and safety management of open services

International Journal of Web and Grid Services
A distributed mutual exclusion algorithm over multi-routing protocol for mobile ad hoc networks

International Journal of Parallel, Emergent and Distributed Systems
Write Markers for Probabilistic Quorum Systems

OPODIS '08 Proceedings of the 12th International Conference on Principles of Distributed Systems
Dual-quorum replication for edge services

Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
A data-centric approach for scalable state machine replication

Future directions in distributed computing
Small trusted primitives for dependable systems

ACM SIGOPS Operating Systems Review
State machine replication with byzantine faults

Replication
Dual-Quorum replication for edge services

Middleware'05 Proceedings of the ACM/IFIP/USENIX 6th international conference on Middleware
Info-based approach in distributed mutual exclusion algorithms

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Coordination among processes in a distributed system can be rendered very complex in a large-scale system where messages may be delayed or lost and when processes may participate only transiently or behave arbitrarily, e.g., after suffering a security breach. In this paper, we propose a scalable architecture to support coordination in such extreme conditions. Our architecture consists of a collection of persistent data servers that implement simple shared data abstractions for clients, without trusting the clients or even the servers themselves. We show that, by interacting with these untrusted servers, clients can solve distributed consensus, a powerful and fundamental coordination primitive. Our architecture is very practical and we describe the implementation of its main components in a system called Fleet.