BASE: Using abstraction to improve fault tolerance

Authors:
Miguel Castro;Rodrigo Rodrigues;Barbara Liskov
Affiliations:
Microsoft Research, Cambridge, UK;MIT Laboratory for Computer Science, Cambridge, MA;MIT Laboratory for Computer Science, Cambridge, MA
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
2003

Citing 32
Cited 17

Scale and performance in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Axioms for concurrent objects

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Replication in the harp file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
High-Availability Computer Systems

Computer
Lightweight causal and atomic group multicast

ACM Transactions on Computer Systems (TOCS)
The 007 Benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
Inside ODBC

Inside ODBC
Efficient optimistic concurrency control using loosely synchronized clocks

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Hypervisor-based fault tolerance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
HAC: hybrid adaptive caching for distributed storage systems

Proceedings of the sixteenth ACM symposium on Operating systems principles
Consistent object replication in the eternal system

Theory and Practice of Object Systems - Special issue high availability in CORBA
Practical Byzantine fault tolerance

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Reaching Agreement in the Presence of Faults

Journal of the ACM (JACM)
Replicated distributed programs

Proceedings of the tenth ACM symposium on Operating systems principles
NFS illustrated

NFS illustrated
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Program Development in Java: Abstraction, Specification, and Object-Oriented Design

Program Development in Java: Abstraction, Specification, and Object-Oriented Design
Practical byzantine fault tolerance and proactive recovery

ACM Transactions on Computer Systems (TOCS)
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
The Implementation of POSTGRES

IEEE Transactions on Knowledge and Data Engineering
Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Portable Checkpointing for Heterogeneous Archtitectures

FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
MetaKernels and Fault Containment Wrappers

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Improving the Scalability of Fault-Tolerant Database Clusters

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
The Ensemble System

The Ensemble System
Software Rejuvenation: Analysis, Module and Applications

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Providing Support for Survivable CORBA Applications with the Immune System

ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
The modified object buffer: a storage management technique for object-oriented databases

The modified object buffer: a storage management technique for object-oriented databases
Proactive recovery in a Byzantine-fault-tolerant system

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Adding group communication and fault-tolerance to CORBA

COOTS'95 Proceedings of the USENIX Conference on Object-Oriented Technologies on USENIX Conference on Object-Oriented Technologies (COOTS)

MIDDLE-R: Consistent database replication at the middleware level

ACM Transactions on Computer Systems (TOCS)
BTS: a Byzantine fault-tolerant tuple space

Proceedings of the 2006 ACM symposium on Applied computing
Worm-IT - A wormhole-based intrusion-tolerant group communication system

Journal of Systems and Software
Survey of research towards robust peer-to-peer networks: search methods

Computer Networks: The International Journal of Computer and Telecommunications Networking
A Parsimonious Approach for Obtaining Resource-Efficient and Trustworthy Execution

IEEE Transactions on Dependable and Secure Computing
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
PeerReview: practical accountability for distributed systems

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Fault Tolerance via Diversity for Off-the-Shelf Products: A Study with SQL Database Servers

IEEE Transactions on Dependable and Secure Computing
DepSpace: a byzantine fault-tolerant coordination service

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Optimizing Threshold Protocols in Adversarial Structures

DISC '08 Proceedings of the 22nd international symposium on Distributed Computing
Design and implementation of a Byzantine fault tolerance framework for Web services

Journal of Systems and Software
Replication predicates for dependent-failure algorithms

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Behavioral distance for intrusion detection

RAID'05 Proceedings of the 8th international conference on Recent Advances in Intrusion Detection
Behavioral distance measurement using hidden markov models

RAID'06 Proceedings of the 9th international conference on Recent Advances in Intrusion Detection
From viewstamped replication to byzantine fault tolerance

Replication
On the practicality of practical Byzantine fault tolerance

Proceedings of the 13th International Middleware Conference
On the efficiency of durable state machine replication

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software errors are a major cause of outages and they are increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. This paper describes a replication technique, BASE, which uses abstraction to reduce the cost of Byzantine fault tolerance and to improve its ability to mask software errors. BASE reduces cost because it enables reuse of off-the-shelf service implementations. It improves availability because each replica can be repaired periodically using an abstract view of the state stored by correct replicas, and because each replica can run distinct or nondeterministic service implementations, which reduces the probability of common mode failures. We built an NFS service where each replica can run a different off-the-shelf file system implementation, and an object-oriented database where the replicas ran the same, nondeterministic implementation. These examples suggest that our technique can be used in practice---in both cases, the implementation required only a modest amount of new code, and our performance results indicate that the replicated services perform comparably to the implementations that they reuse.