Lazy replication: exploiting the semantics of distributed services
PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
Practical uses of synchronized clocks in distributed systems
PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Replication in the harp file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Some ideas on support for fault tolerance in COMANDOS, an object oriented distributed system
ACM SIGOPS Operating Systems Review
A replicated Unix file system (extended abstract)
ACM SIGOPS Operating Systems Review
Providing high availability using lazy replication
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Software Engineering
EW 4 Proceedings of the 4th workshop on ACM SIGOPS European workshop
Some ideas on support for fault tolerance in COMANDOS, an object oriented distributed system
EW 4 Proceedings of the 4th workshop on ACM SIGOPS European workshop
Practical uses of synchronized clocks in distributed systems
Distributed Computing
The SMART way to migrate replicated stateful services
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Paxos replicated state machines as the basis of a high-performance data store
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Byzantizing paxos by refinement
DISC'11 Proceedings of the 25th international conference on Distributed computing
Hi-index | 0.00 |
This dissertation presents viewstamped replication, a new algorithm for the implementation of highly available computer services that continue to be usable in spite of node crashes and network partitions. Our goal is to design an efficient mechanism that makes it easy for programmers to implement these services without complicating the programming model. Our replication method is based on a primary copy technique, where one replica is the primary and others are backups, and is integrated into the fabric of an atomic transaction mechanism. Transactions are run only at the primary and need not involve the backups; the primary propagates the effects of transaction processing to the backups in the background. The method exhibits low delay during normal operation, has low overhead, and increases the likelihood that transactions will commit in spite of failures. When failure occurs, replicas are reorganized automatically and a new primary is selected if the old one become inaccessible. This reorganization is called a view change and is accomplished by a view management algorithm. Since the primary only communicates with the backups in background mode, the effects of some processing may be lost after a view change; the affected transactions must abort. If the effects are known at the new primary, then no information is lost and the transaction can commit. Furthermore, if transactions commit, we guarantee that their effects are not lost. A special kind of timestamp, called a viewstamp, allows the algorithm to distinguish these cases inexpensively.