Deep Store: An Archival Storage System Architecture

Authors:
Lawrence L. You;Kristal T. Pollack;Darrell D. E. Long
Affiliations:
University of California at Santa Cruz;University of California at Santa Cruz;University of California at Santa Cruz
Venue:
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Year:
2005

Citing 12
Cited 35

Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Min-wise independent permutations

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
A low-bandwidth network file system

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Archiving scientific data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Compactly encoding unstructured inputs with differential compression

Journal of the ACM (JACM)
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Venti: A New Approach to Archival Storage

FAST '02 Proceedings of the Conference on File and Storage Technologies
Rules of Thumb in Data Engineering

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Redundancy elimination within large collections of files

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference

Challenges in managing dependable data systems

ACM SIGMETRICS Performance Evaluation Review - Design, implementation, and performance of storage systems
A fresh look at the reliability of long-term digital storage

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
TAPER: tiered approach for eliminating redundancy in replica synchronization

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
POTSHARDS: secure long-term storage without encryption

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Pergamum: replacing tape with energy efficient, reliable, disk-based archival storage

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Evaluating the usefulness of content addressable storage for high-performance data intensive applications

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Secure data deduplication

Proceedings of the 4th ACM international workshop on Storage security and survivability
Demystifying data deduplication

Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
Preservation DataStores: new storage paradigm for preservation environments

IBM Journal of Research and Development
IZO: applications of large-window compression to virtual machine management

LISA'08 Proceedings of the 22nd conference on Large installation system administration conference
Sparse indexing: large scale, inline deduplication using sampling and locality

FAST '09 Proccedings of the 7th conference on File and storage technologies
HYDRAstor: a Scalable Secondary Storage

FAST '09 Proccedings of the 7th conference on File and storage technologies
Smoke and mirrors: reflecting files at a geographically remote location without loss of performance

FAST '09 Proccedings of the 7th conference on File and storage technologies
The effectiveness of deduplication on virtual machine disk images

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Multi-level comparison of data deduplication in a backup scenario

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
POTSHARDS—a secure, recoverable, long-term archival storage system

ACM Transactions on Storage (TOS)
Multi-agent Based Personal File Management Using Case Based Reasoning

HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
Enabling High Data Throughput in Desktop Grids through Decentralized Data and Metadata Management: The BlobSeer Approach

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Managing computer files via artificial intelligence approaches

Artificial Intelligence Review
BlobSeer: how to enable efficient versioning for large object storage under heavy access concurrency

Proceedings of the 2009 EDBT/ICDT Workshops
Distributed archiving storage system based on CAS

VECIMS'09 Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems
HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Bimodal content defined chunking for backup streams

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Moving from logical sharing of guest OS to physical sharing of deduplication on virtual machine

HotSec'10 Proceedings of the 5th USENIX conference on Hot topics in security
PRESIDIO: A Framework for Efficient Archival Data Storage

ACM Transactions on Storage (TOS)
Anchor-driven subchunk deduplication

Proceedings of the 4th Annual International Conference on Systems and Storage
Analysis of Workload Behavior in Scientific and Historical Long-Term Data Repositories

ACM Transactions on Storage (TOS)
iDedup: latency-aware, inline data deduplication for primary storage

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Hash based disk imaging using AFF4

Digital Investigation: The International Journal of Digital Forensics & Incident Response
SeWDReSS: on the design of an application independent, secure, wide-area disaster recovery storage system

Multimedia Tools and Applications
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services

Journal of Signal Processing Systems
SBBS: A sliding blocking algorithm with backtracking sub-blocks for duplicate data detection

Expert Systems with Applications: An International Journal
Concurrent deletion in a distributed content-addressable storage system with global deduplication

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the Deep Store archival storage architecture, a large-scale storage system that stores immutable dataefficiently and reliably for long periods of time. Archived data is stored across a cluster of nodes and recorded to hard disk. The design differentiates itself from traditional file systems by eliminating redundancy within and across files, distributing content for scalability, associating rich metadata with content, and using variable levels of replication based on the importance or degree of dependency of each piece of stored data. We evaluate the foundations of our design, including PRESIDIO, a virtual content-addressable storage framework with multiple methods for inter-file and intra-file compression that effectively addresses the data-dependent variability of data compression. We measure content and metadata storage efficiency, demonstrate the need for a variable-degree replication model, and provide preliminary results for storage performance.