A fresh look at the reliability of long-term digital storage

  • Authors:
  • Mary Baker;Mehul Shah;David S. H. Rosenthal;Mema Roussopoulos;Petros Maniatis;TJ Giuli;Prashanth Bungale

  • Affiliations:
  • HP Labs;HP Labs;Stanford University;Harvard University;Intel Research Berkeley;Ford Research and Advanced Engineering;Harvard University

  • Venue:
  • Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
  • Year:
  • 2006

Quantified Score

Hi-index 0.03

Visualization

Abstract

Emerging Web services, such as email, photo sharing, and web site archives, must preserve large volumes of quickly accessible data indefinitely into the future. The costs of doing so often determine whether the service is economically viable. We make the case that these applications' demands on large scale storage systems over long time horizons require us to reevaluate traditional system designs. We examine threats to long-lived data from an end-to-end perspective, taking into account not just hardware and software faults but also faults due to humans and organizations. We present a simple model of long-term storage failures that helps us reason about various strategies for addressing some of these threats. Using this model we show that the most important strategies for increasing the reliability of long-term storage are detecting latent faults quickly, automating fault repair to make it cheaper and faster, and increasing the independence of data replicas.