Efficient archival data storage

  • Authors:
  • Darrell D. Long;Lawrence L. You

  • Affiliations:
  • University of California, Santa Cruz;University of California, Santa Cruz

  • Venue:
  • Efficient archival data storage
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The increasing amounts of data that are created and that must be stored continues to grow. Archival storage systems must retain large volumes of data reliably over long periods of time at a low cost. Archival storage requirements and the type of stored vary widely, from being highly compressed to highly redundant. The ever-increasing volume of archival data that need to be retained for long periods of time has motivated the design of low-cost, high-efficiency storage systems. Due to economic factors, such as the rapidly decreasing cost of disk storage, memory and processing---as well as improvements in technology, such as increased magnetic storage densities, research and development have moved toward disk-based archival storage. To further lower cost, they eliminate redundancy using inter-file and intra-file data compression. Each system uses a compression method but no system compresses data consistently better than all efficient storage methods. Our main contribution, presented in this dissertation, is to prove the thesis that it is possible to create a scalable archival storage system that efficiently stores diverse data by progressively applying large-scale data compression, providing better space efficiency than any single existing method. To support this, our work identifies common properties in these systems, evaluates efficient storage methods with respect to these properties, and presents a model for expected space and time behavior. In addition, we have developed a prototype storage system using a Progressive R edundancy Elimination of Similar and Identical Data In Objects (PRESIDIO) framework. Similar and identical files are detected by the PRE algorithm. Data is recorded using a virtual content-addressable storage (VCAS) mechanism that can be used to store content with hybrid inter-file compression methods. This work is a key part of the Deep Store archival storage architecture, a large-scale storage system that stores immutable data efficiently and reliably for long periods of time over a cluster of nodes that record data to disk.