Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Leases: an efficient fault-tolerant mechanism for distributed file cache consistency
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Linearizability: a correctness condition for concurrent objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Replication in the harp file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Disconnected operation in the Coda File System
ACM Transactions on Computer Systems (TOCS)
Petal: distributed virtual disks
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Dynamic voting for consistent primary components
PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Flexible update propagation for weakly consistent replication
Proceedings of the sixteenth ACM symposium on Operating systems principles
ACM Transactions on Computer Systems (TOCS)
Distributed systems (2nd Ed.)
The serializability of concurrent database updates
Journal of the ACM (JACM)
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Myriad: Cost-Effective Disaster Tolerance
FAST '02 Proceedings of the Conference on File and Storage Technologies
Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines
Proceedings of the Sixth International Conference on Data Engineering
A principle for resilient sharing of distributed resources
ICSE '76 Proceedings of the 2nd international conference on Software engineering
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Multi-Tier Architecture for Web Search Engines
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
FAB: building distributed enterprise disk arrays from commodity components
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Average probe complexity in quorum systems
Journal of Computer and System Sciences
Chain replication for supporting high throughput and availability
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Boxwood: abstractions as the foundation for storage infrastructure
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Data management for internet-scale single-sign-on
WORLDS'06 Proceedings of the 3rd conference on USENIX Workshop on Real, Large Distributed Systems - Volume 3
The Chubby lock service for loosely-coupled distributed systems
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
Towards elastic transactional cloud storage with range query support
Proceedings of the VLDB Endowment
Transaction processing in a peer to peer database network
Data & Knowledge Engineering
Supporting multiple isolation levels in replicated environments
Data & Knowledge Engineering
Scalability of replicated metadata services in distributed file systems
DAIS'12 Proceedings of the 12th IFIP WG 6.1 international conference on Distributed Applications and Interoperable Systems
Hi-index | 0.00 |
The task of consistently and reliably replicating data is fundamental in distributed systems, and numerous existing protocols are able to achieve such replication efficiently. When called on to build a large-scale enterprise storage system with built-in replication, we were therefore surprised to discover that no existing protocols met our requirements. As a result, we designed and deployed a new replication protocol called Niobe. Niobe is in the primary-backup family of protocols, and shares many similarities with other protocols in this family. But we believe Niobe is significantly more practical for large-scale enterprise storage than previously published protocols. In particular, Niobe is simple, flexible, has rigorously proven yet simply stated consistency guarantees, and exhibits excellent performance. Niobe has been deployed as the backend for a commercial Internet service; its consistency properties have been proved formally from first principles, and further verified using the TLA + specification language. We describe the protocol itself, the system built to deploy it, and some of our experiences in doing so.