BASE: using abstraction to improve fault tolerance

Authors:
Rodrigo Rodrigues;Miguel Castro;Barbara Liskov
Affiliations:
MIT Laboratory for Computer Science, Cambridge, MA;Microsoft Research Ltd., Cambridge, UK;MIT Laboratory for Computer Science, Cambridge, MA
Venue:
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Year:
2001

Citing 23
Cited 49

Scale and performance in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Replication in the harp file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
High-Availability Computer Systems

Computer
The 007 Benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Inside ODBC

Inside ODBC
Efficient optimistic concurrency control using loosely synchronized clocks

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Hypervisor-based fault tolerance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
HAC: hybrid adaptive caching for distributed storage systems

Proceedings of the sixteenth ACM symposium on Operating systems principles
Consistent object replication in the eternal system

Theory and Practice of Object Systems - Special issue high availability in CORBA
Practical Byzantine fault tolerance

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Reaching Agreement in the Presence of Faults

Journal of the ACM (JACM)
Replicated distributed programs

Proceedings of the tenth ACM symposium on Operating systems principles
NFS illustrated

NFS illustrated
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Program Development in Java: Abstraction, Specification, and Object-Oriented Design

Program Development in Java: Abstraction, Specification, and Object-Oriented Design
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
MetaKernels and Fault Containment Wrappers

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Software Rejuvenation: Analysis, Module and Applications

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Providing Support for Survivable CORBA Applications with the Immune System

ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
The modified object buffer: a storage management technique for object-oriented databases

The modified object buffer: a storage management technique for object-oriented databases
Proactive recovery in a Byzantine-fault-tolerant system

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Adding group communication and fault-tolerance to CORBA

COOTS'95 Proceedings of the USENIX Conference on Object-Oriented Technologies on USENIX Conference on Object-Oriented Technologies (COOTS)

Practical byzantine fault tolerance and proactive recovery

ACM Transactions on Computer Systems (TOCS)
Minimal Byzantine Storage

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Synchronous Consensus for Dependent Process Failures

ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Separating agreement from execution for byzantine fault tolerant services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Slingshot: deploying stateful services in wireless hotspots

Proceedings of the 3rd international conference on Mobile systems, applications, and services
Implementing Trustworthy Services Using Replicated State Machines

IEEE Security and Privacy
BAR fault tolerance for cooperative services

Proceedings of the twentieth ACM symposium on Operating systems principles
Detecting past and present intrusions through vulnerability-specific predicates

Proceedings of the twentieth ACM symposium on Operating systems principles
Rx: treating bugs as allergies---a safe method to survive software failures

Proceedings of the twentieth ACM symposium on Operating systems principles
The design of a robust peer-to-peer system

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Fast Byzantine Consensus

IEEE Transactions on Dependable and Secure Computing
The SMART way to migrate replicated stateful services

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Undo for operators: building an undoable e-mail store

ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Flashback: a lightweight extension for rollback and deterministic replay for software debugging

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Rx: Treating bugs as allergies—a safe method to survive software failures

ACM Transactions on Computer Systems (TOCS)
Zyzzyva: speculative byzantine fault tolerance

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Attested append-only memory: making adversaries stick to their word

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Building bug-tolerant routers with virtualization

Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
Zyzzyva: speculative Byzantine fault tolerance

Communications of the ACM - Remembering Jim Gray
Diverse replication for single-machine Byzantine-fault tolerance

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Efficient state transfer for hypervisor-based proactive recovery

Proceedings of the 2nd workshop on Recent advances on intrusiton-tolerant systems
CloudAV: N-version antivirus in the network cloud

SS'08 Proceedings of the 17th conference on Security symposium
Practical and low-overhead masking of failures of TCP-based servers

ACM Transactions on Computer Systems (TOCS)
Zeno: eventually consistent Byzantine-fault tolerance

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Tolerating latency in replicated state machines through client speculation

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Symmetric active/active metadata service for high availability parallel file systems

Journal of Parallel and Distributed Computing
Zyzzyva: Speculative Byzantine fault tolerance

ACM Transactions on Computer Systems (TOCS)
Consensus When All Processes May Be Byzantine for Some Time

SSS '09 Proceedings of the 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems
BFTW3: why? when? where? workshop on the theory and practice of byzantine fault tolerance

ACM SIGACT News
The next 700 BFT protocols

Proceedings of the 5th European conference on Computer systems
Proactive obfuscation

ACM Transactions on Computer Systems (TOCS)
The byzantine empire in the intercloud

ACM SIGACT News
Byzantium: Byzantine-fault-tolerant database replication providing snapshot isolation

HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
Prophecy: using history for high-throughput fault tolerance

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Tolerating file-system mistakes with EnvyFS

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Enhancing the performance of HLA-based simulation systems via software diversity and active replication

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Using allopoietic agents in replicated software to respond to errors, faults, and attacks

Proceedings of the 48th Annual Southeast Regional Conference
Increasing performance in byzantine fault-tolerant systems with on-demand replica consistency

Proceedings of the sixth conference on Computer systems
Efficient middleware for byzantine fault tolerant database replication

Proceedings of the sixth conference on Computer systems
ZZ and the art of practical BFT execution

Proceedings of the sixth conference on Computer systems
Beyond one-third faulty replicas in byzantine fault tolerant systems

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Parsimony-Based approach for obtaining resource-efficient and trustworthy execution

LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Scalable testing of file system checkers

Proceedings of the 7th ACM european conference on Computer Systems
Implementing trustworthy services using replicated state machines

Replication
Selected results from the latest decade of quorum systems research

Replication
All about Eve: execute-verify replication for multi-core servers

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Iwazaru: the byzantine sequencer

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Towards transparent hardening of distributed systems

Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
HARDFS: hardening HDFS with selective and lightweight versioning

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software errors are a major cause of outages and they are increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. This paper describes a replication technique, BASE, which uses abstraction to reduce the cost of Byzantine fault tolerance and to improve its ability to mask software errors. BASE reduces cost because it enables reuse of off-the-shelf service implementations. It improves availability because each replica can be repaired periodically using an abstract view of the state stored by correct replicas, and because each replica can run distinct or non-deterministic service implementations, which reduces the probability of common mode failures. We built an NFS service where each replica can run a different off-the-shelf file system implementation, and an object-oriented database where the replicas ran the same, non-deterministic implementation. These examples suggest that our technique can be used in practice --- in both cases, the implementation required only a modest amount of new code, and our performance results indicate that the replicated services perform comparably to the implementations that they reuse.