Agile Store: Experience with Quorum-Based Data Replication Techniques for Adaptive Byzantine Fault Tolerance

  • Authors:
  • Lei Kong;Deepak J. Manohar;Mustaque Ahamad;Arun Subbiah;Michael Sun;Douglas M. Blough

  • Affiliations:
  • College of Computing, Georgia Institute of Technology;College of Computing, Georgia Institute of Technology;College of Computing, Georgia Institute of Technology;Electrical and Computer Engineering, Georgia Institute of Technology;Electrical and Computer Engineering, Georgia Institute of Technology;Electrical and Computer Engineering, Georgia Institute of Technology

  • Venue:
  • SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Quorum protocols offer several benefits when used to maintain replicated data but techniques for reducing overheads associated with them have not been explored in detail. It is desirable that a system be able to adapt its operation so that fault tolerance related overheads are only incurred when the protocol execution actually encounters faults. There are a number of issues that need to be carefully examined to achieve such agility of quorum based systems. We make use of a file system prototype, developed in our Agile Store project, to experimentally evaluate several techniques that are important for efficient implementation of Byzantine fault-tolerant quorum protocols. We present an optimistic quorum collection scheme and a probabilistic hashing scheme for determining the response to a quorum request, and show that they lead to significant performance improvements. The Agile Store also makes use of reconfigurable quorum techniques to allow system size and fault threshold to be dynamically varied when, for example, faulty servers are removed, new servers are added, or the threat level is changed. We quantify the performance gains made possible by such reconfiguration of quorum parameters. We also show how performance scales with different system parameters and how it is affected by design choices such as whether to use proxies. We believe that the results in the paper provide important insights into how to implement quorum protocols to provide good performance while achieving Byzantine fault tolerance.