A Parsimonious Approach for Obtaining Resource-Efficient and Trustworthy Execution

Authors:
HariGovind V. Ramasamy;Adnan Agbaria;William H. Sanders
Affiliations:
IEEE;IEEE;IEEE
Venue:
IEEE Transactions on Dependable and Secure Computing
Year:
2007

Citing 20
Cited 2

A digital signature scheme secure against adaptive chosen-message attacks

SIAM Journal on Computing - Special issue on cryptography
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Secure agreement protocols: reliable and atomic group multicast in rampart

CCS '94 Proceedings of the 2nd ACM Conference on Computer and communications security
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme

IEEE Transactions on Computers
The primary-backup approach

Distributed systems (2nd Ed.)
The Timed Asynchronous Distributed System Model

IEEE Transactions on Parallel and Distributed Systems
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Handbook of Applied Cryptography

Handbook of Applied Cryptography
Practical byzantine fault tolerance and proactive recovery

ACM Transactions on Computer Systems (TOCS)
The Timely Computing Base Model and Architecture

IEEE Transactions on Computers
Minimal Byzantine Storage

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Encapsulating Failure Detection: From Crash to Byzantine Failures

Ada-Europe '02 Proceedings of the 7th Ada-Europe International Conference on Reliable Software Technologies
The Rampart Toolkit for Building High-Integrity Services

Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
Optimistic Byzantine Agreement

SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
Evaluating Distributed Checkpointing Protocol

ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
BASE: Using abstraction to improve fault tolerance

ACM Transactions on Computer Systems (TOCS)
An Adaptive Failure Detection Protocol

PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
Separating agreement from execution for byzantine fault tolerant services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Byzantine quorum systems

Distributed Computing

PeerReview: practical accountability for distributed systems

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Parsimony-Based approach for obtaining resource-efficient and trustworthy execution

LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a resource-efficient way to execute requests in Byzantine-fault-tolerant replication that is particularly well-suited for services in which request processing is resource-intensive. Previous efforts took a failure masking all-active approach of using all execution replicas to execute all requests; at least 2t+1 execution replicas are needed to mask t Byzantine-faulty ones. We describe an asynchronous protocol that provides resource-efficient execution by combining failure masking with imperfect failure detection and checkpointing. Our protocol is parsimonious since it uses only t+1 execution replicas, called the primary committee or {\cal PC}, to execute the requests under normal conditions characterized by a stable network and no misbehavior by {\cal PC} replicas; thus, a trustworthy reply can be obtained with the same latency, but with only about half of the overall resource use of the all-active approach. However, a request that exposes faults among the {\cal PC} replicas will cause the protocol to switch to a recovery mode, in which all 2t+1 replicas execute the request and send their replies; then, after selecting a new {\cal PC}, the protocol switches back to parsimonious execution. Such a request will incur a higher latency using our approach than the all-active approach, mainly because of fault detection latency. Practical observations point to the fact that failures and instability are the exception rather than the norm. That motivated our decision to optimize resource efficiency for the common case, even if it means paying a slightly higher performance cost during periods of instability.