Proactive recovery in a Byzantine-fault-tolerant system

Authors:
Miguel Castro;Barbara Liskov
Affiliations:
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA;Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA
Venue:
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Year:
2000

Citing 20
Cited 39

Scale and performance in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Axioms for concurrent objects

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems

PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Replication in the harp file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A security risk of depending on synchronized clocks

ACM SIGOPS Operating Systems Review
Maintaining authenticated communication in the presence of break-ins

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Proactive public key and signature systems

Proceedings of the 4th ACM conference on Computer and communications security
Practical Byzantine fault tolerance

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Separating key management from file system security

Proceedings of the seventeenth ACM symposium on Operating systems principles
Secure distributed storage and retrieval

Theoretical Computer Science
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
UMAC: Fast and Secure Message Authentication

CRYPTO '99 Proceedings of the 19th Annual International Cryptology Conference on Advances in Cryptology
Proactive Secret Sharing Or: How to Cope With Perpetual Leakage

CRYPTO '95 Proceedings of the 15th Annual International Cryptology Conference on Advances in Cryptology
The Rampart Toolkit for Building High-Integrity Services

Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
The SecureRing Protocols for Securing Group Communication

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences - Volume 3
Building Diverse Computer Systems

HOTOS '97 Proceedings of the 6th Workshop on Hot Topics in Operating Systems (HotOS-VI)
Secure and Scalable Replication in Phalanx

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
The linux BIOS

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
A new paradigm for collision-free hashing: incrementality at reduced cost

EUROCRYPT'97 Proceedings of the 16th annual international conference on Theory and application of cryptographic techniques

BASE: using abstraction to improve fault tolerance

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Fast and secure distributed read-only file system

ACM Transactions on Computer Systems (TOCS)
COCA: A secure distributed online certification authority

ACM Transactions on Computer Systems (TOCS)
A Generalized Birthday Problem

CRYPTO '02 Proceedings of the 22nd Annual International Cryptology Conference on Advances in Cryptology
SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network

Pervasive '02 Proceedings of the First International Conference on Pervasive Computing
Byzantine Fault Tolerance Can Be Fast

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Reliability Mechanisms for Very Large Storage Systems

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
BASE: Using abstraction to improve fault tolerance

ACM Transactions on Computer Systems (TOCS)
Proactive secure message transmission in asynchronous networks

Proceedings of the twenty-second annual symposium on Principles of distributed computing
Consistent and automatic replica regeneration

ACM Transactions on Storage (TOS)
Defending a P2P Digital Preservation System

IEEE Transactions on Dependable and Secure Computing
Plutus: Scalable Secure File Sharing on Untrusted Storage

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Speculative execution in a distributed file system

Proceedings of the twentieth ACM symposium on Operating systems principles
Rx: treating bugs as allergies---a safe method to survive software failures

Proceedings of the twentieth ACM symposium on Operating systems principles
Solving Vector Consensus with a Wormhole

IEEE Transactions on Parallel and Distributed Systems
The design of a robust peer-to-peer system

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Speculative execution in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Flashback: a lightweight extension for rollback and deterministic replay for software debugging

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Consistent and automatic replica regeneration

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Shark: scaling file servers via cooperative caching

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Understanding and dealing with operator mistakes in internet services

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Storage-based intrusion detection: watching storage activity for suspicious behavior

SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
Rx: Treating bugs as allergies—a safe method to survive software failures

ACM Transactions on Computer Systems (TOCS)
Automated Rule-Based Diagnosis through a Distributed Monitor System

IEEE Transactions on Dependable and Secure Computing
Data and code integrity in Grid environments

SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
Efficient state transfer for hypervisor-based proactive recovery

Proceedings of the 2nd workshop on Recent advances on intrusiton-tolerant systems
Continuous Consensus with Failures and Recoveries

DISC '08 Proceedings of the 22nd international symposium on Distributed Computing
Tiered fault tolerance for long-term integrity

FAST '09 Proccedings of the 7th conference on File and storage technologies
Tolerating latency in replicated state machines through client speculation

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Zyzzyva: Speculative Byzantine fault tolerance

ACM Transactions on Computer Systems (TOCS)
Self-stabilizing device drivers

SSS'06 Proceedings of the 8th international conference on Stabilization, safety, and security of distributed systems
Storage-Based Intrusion Detection

ACM Transactions on Information and System Security (TISSEC)
Automating configuration troubleshooting with dynamic information flow analysis

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Small trusted primitives for dependable systems

ACM SIGOPS Operating Systems Review
Pond: the oceanstore prototype

FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
Beyond one-third faulty replicas in byzantine fault tolerant systems

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Memory management for self-stabilizing operating systems

SSS'05 Proceedings of the 7th international conference on Self-Stabilizing Systems
From viewstamped replication to byzantine fault tolerance

Replication
Robustness in the Salus scalable block store

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an asynchronous state-machine replication system that tolerates Byzantine faults, which can be caused by malicious attacks or software errors. Our system is the first to recover Byzantine-faulty replicas proactively and it performs well because it uses symmetric rather than public-key cryptography for authentication. The recovery mechanism allows us to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a window of vulnerability that is small under normal conditions. The window may increase under a denial-of-service attack but we can detect and respond to such attacks. The paper presents results of experiments showing that overall performance is good and that even a small window of vulnerability has little impact on service latency.