Tiered fault tolerance for long-term integrity

Authors:
Byung-Gon Chun;Petros Maniatis;Scott Shenker;John Kubiatowicz
Affiliations:
Intel Research Berkeley;Intel Research Berkeley;University of California at Berkeley;University of California at Berkeley
Venue:
FAST '09 Proccedings of the 7th conference on File and storage technologies
Year:
2009

Citing 45
Cited 3

Authentication in distributed systems: theory and practice

ACM Transactions on Computer Systems (TOCS)
New Hybrid Fault Models for Asynchronous Approximate Agreement

IEEE Transactions on Computers
Safe kernel extensions without run-time checking

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Practical Byzantine fault tolerance

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Accountable certificate management using undeniable attestations

Proceedings of the 7th ACM conference on Computer and communications security
OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
COCA: A secure distributed online certification authority

ACM Transactions on Computer Systems (TOCS)
Reaching Approximate Agreement with Mixed-Mode Faults

IEEE Transactions on Parallel and Distributed Systems
Venti: A New Approach to Archival Storage

FAST '02 Proceedings of the Conference on File and Storage Technologies
A Digital Signature Based on a Conventional Encryption Function

CRYPTO '87 A Conference on the Theory and Applications of Cryptographic Techniques on Advances in Cryptology
Secure History Preservation Through Timeline Entanglement

Proceedings of the 11th USENIX Security Symposium
On Certificate Revocation and Validation

FC '98 Proceedings of the Second International Conference on Financial Cryptography
he Timely Computing Base: Timely Actions in the Presence of Uncertain Timeliness

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Model-carrying code: a practical approach for safe execution of untrusted applications

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Separating agreement from execution for byzantine fault tolerant services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Efficient Byzantine-Tolerant Erasure-Coded Storage

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Byzantine disk paxos: optimal resilience with byzantine shared memory

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Shield: vulnerability-driven network filters for preventing known vulnerability exploits

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
The IBM PCIXCC: a new cryptographic coprocessor for the IBM eServer

IBM Journal of Research and Development
The LOCKSS peer-to-peer digital preservation system

ACM Transactions on Computer Systems (TOCS)
Solving Vector Consensus with a Wormhole

IEEE Transactions on Parallel and Distributed Systems
Travelling through wormholes: a new look at distributed systems models

ACM SIGACT News
Optimal Resilience for Erasure-Coded Byzantine Distributed Storage

DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Proactive resilience through architectural hybridization

Proceedings of the 2006 ACM symposium on Applied computing
Low complexity Byzantine-resilient consensus

Distributed Computing
A fresh look at the reliability of long-term digital storage

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Glacier: highly durable, decentralized storage despite massive correlated failures

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Proactive recovery in a Byzantine-fault-tolerant system

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Efficient replica maintenance for distributed storage systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Strong accountability for network storage

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Sealing OS processes to improve dependability and safety

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Antiquity: exploiting a secure log for wide-area distributed storage

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Low-overhead byzantine fault-tolerant storage

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
PeerReview: practical accountability for distributed systems

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Attested append-only memory: making adversaries stick to their word

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Preservation DataStores: Architecture for Preservation Aware Storage

MSST '07 Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies
POTSHARDS: secure long-term storage without encryption

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Pergamum: replacing tape with energy efficient, reliable, disk-based archival storage

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Super-efficient verification of dynamic outsourced databases

CT-RSA'08 Proceedings of the 2008 The Cryptopgraphers' Track at the RSA conference on Topics in cryptology
Uncertainty and predictability: can they be reconciled?

Future directions in distributed computing
The virtue of dependent failures in multi-site systems

HotDep'05 Proceedings of the First conference on Hot topics in system dependability
Beyond one-third faulty replicas in byzantine fault tolerant systems

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Saturn: a SAT-based tool for bug detection

CAV'05 Proceedings of the 17th international conference on Computer Aided Verification

Augmented smartphone applications through clone cloud execution

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Small trusted primitives for dependable systems

ACM SIGOPS Operating Systems Review
Integrity and consistency for untrusted services

SOFSEM'11 Proceedings of the 37th international conference on Current trends in theory and practice of computer science

Quantified Score

Hi-index	0.02

Visualization

Abstract

Fault-tolerant services typically make assumptions about the type and maximum number of faults that they can tolerate while providing their correctness guarantees; when such a fault threshold is violated, correctness is lost. We revisit the notion of fault thresholds in the context of long-term archival storage. We observe that fault thresholds are inevitably violated in long-term services, making traditional fault tolerance inapplicable to the long-term. In this work, we undertake a "reallocation of the fault-tolerance budget" of a long-term service. We split the service into service pieces, each of which can tolerate a different number of faults without failing (and without causing the whole service to fail): each piece can be either in a critical trusted fault tier, which must never fail, or an untrusted fault tier, which can fail massively and often, or other fault tiers in between. By carefully engineering the split of a long-term service into pieces that must obey distinct fault thresholds, we can prolong its inevitable demise. We demonstrate this approach with Bonafide, a long-term key-value store that, unlike all similar systems proposed in the literature, maintains integrity in the face of Byzantine faults without requiring self-certified data. We describe the notion of tiered fault tolerance, the design, implementation, and experimental evaluation of Bonafide, and argue that our approach is a practical yet significant improvement over the state of the art for long-term services.