A digital signature scheme secure against adaptive chosen-message attacks
SIAM Journal on Computing - Special issue on cryptography
Distributed systems (2nd Ed.)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
BASE: using abstraction to improve fault tolerance
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Handbook of Applied Cryptography
Handbook of Applied Cryptography
Trust in Cyberspace
Sabotage-tolerance mechanisms for volunteer computing systems
Future Generation Computer Systems - Best papers from symp. on cluster computing and the grid (CCGRID 2001)
Practical byzantine fault tolerance and proactive recovery
ACM Transactions on Computer Systems (TOCS)
Secure and Efficient Asynchronous Broadcast Protocols
CRYPTO '01 Proceedings of the 21st Annual International Cryptology Conference on Advances in Cryptology
The Rampart Toolkit for Building High-Integrity Services
Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
Optimistic Byzantine Agreement
SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
Separating agreement from execution for byzantine fault tolerant services
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Distributed Computing
A Parsimonious Approach for Obtaining Resource-Efficient and Trustworthy Execution
IEEE Transactions on Dependable and Secure Computing
Hi-index | 0.00 |
We propose a resource-efficient way to execute requests in Byzantine-fault-tolerant replication that is particularly well-suited for services in which request processing is resource-intensive. Previous efforts took a failure-masking all-active approach of using all 2t + 1 execution replicas to execute all requests, where t is the maximum number of failures tolerated. We describe an asynchronous execution protocol that combines failure masking with imperfect failure detection and checkpointing. Our protocol is parsimony-based since it uses only t + 1 execution replicas, called the primary committee or pc, to execute the requests normally. Under normal conditions, characterized by a stable network and no misbehavior by pc replicas, our approach enables a trustworthy reply to be obtained with the same latency as in the all-active approach, but with only about half of the overall resource use of the all-active approach. However, a request that exposes faults among the pc replicas will incur a higher latency than the all-active approach mainly due to fault detection latency. Under such conditions, the protocol switches to a recovery mode, in which all 2t + 1 replicas execute the request and send their replies. Then, after selecting a new pc, the request latency returns to the same level as that of all-active execution. Practical observations point to the fact that failures and instability are the exception rather than the norm. That motivated our decision to optimize resource efficiency for the common case, even if it means paying a slightly higher performance cost during periods of instability.