Proactive replication in distributed storage systems using machine availability estimation

  • Authors:
  • Alessandro Duminuco;Ernst Biersack;Taoufik En-Najjary

  • Affiliations:
  • Institut Eurecom, Sophia Antipolis, France;Institut Eurecom, Sophia Antipolis, France;Institut Eurecom, Sophia Antipolis, France

  • Venue:
  • CoNEXT '07 Proceedings of the 2007 ACM CoNEXT conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed storage systems provide data availability by means of redundancy. To assure a given level of availability in case of node failures, new redundant fragments need to be introduced. Since node failures can be either transient or permanent, deciding when to generate new fragments is non-trivial. An additional difficulty is due to the fact that the failure behavior in terms of the rate of permanent and transient failures may vary over time. To be able to adapt to changes in the failure behavior, many systems adopt a reactive approach, in which new fragments are created as soon as a failure is detected. However, reactive approaches tend to produce spikes in bandwidth consumption. Proactive approaches create new fragments at a fixed rate that depends on the knowledge of the failure behavior or is given by the system administrator. However, existing proactive systems are not able to adapt to a changing failure behavior, which is common in real world. We propose a new technique based on an ongoing estimation of the failure behavior that is obtained using a model that consists of a network of queues. This scheme combines the adaptiveness of reactive systems with the smooth bandwidth usage of proactive systems, generalizing the two previous approaches. Now, the duality reactive or proactive becomes a specific case of a wider approach tunable with respect to the dynamics of the failure behavior.