Robust services in dynamic systems

  • Authors:
  • Barbara H. Liskov;Rodrigo Seromenho Miragaia Rodrigues

  • Affiliations:
  • Massachusetts Institute of Technology;Massachusetts Institute of Technology

  • Venue:
  • Robust services in dynamic systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Our growing reliance on online services accessible on the Internet demands highly-available systems that work correctly without interruption. This thesis extends previous work on Byzantine-fault-tolerant replication to meet the new requirements of current Internet services: scalability and the ability to reconfigure the service automatically in the presence of a changing system membership. Our solution addresses two important problems that appear in dynamic replicated services: First, we present a membership service that provides servers and clients in the system with a sequence of consistent views of the system membership (i.e., the set, of currently available servers). The membership service is designed to be scalable, and to handle membership changes mostly automatically. Furthermore, the membership service is itself reconfigurable, and tolerates arbitrary faults of a subset of the servers that are implementing it at any instant. The second part of our solution is a generic methodology for transforming replicated services that assume a fixed membership into services that support a dynamic system membership. The methodology uses the output from the membership service to decide when to reconfigure. We built two example services using this methodology: a dynamic Byzantine quorum system that supports read and write operations, and a dynamic Byzantine state machine replication system that supports any deterministic service. The final contribution of this thesis is an analytic study that points out an obstacle to the deployment of replicated services based on a dynamic membership. The basic problem is that maintaining redundancy levels for the service state as servers join and leave the system is costly in terms of network bandwidth. To evaluate how dynamic the system membership can be, we developed a model for the cost of state maintenance in dynamic replicated services, and we use treasured values from real-world traces to determine possible values for the parameters of the model. We conclude that certain deployments (like a volunteer-based system) are incompatible with the goals of large-scale reliable services. We implemented the membership service and the two example services. Our performance results show that the membership service is scalable, and our replicated services perform well, even during reconfigurations. (Abstract shortened by UMI.)