File archive activity in a supercomputing environment
ICS '93 Proceedings of the 7th international conference on Supercomputing
File system usage in Windows NT 4.0
Proceedings of the seventeenth ACM symposium on Operating systems principles
Long term file migration: development and evaluation of algorithms
Communications of the ACM
Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
Massive arrays of idle disks for storage archives
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Analysis of Long-Term UNIX File Access Patterns for Application
Analysis of Long-Term UNIX File Access Patterns for Application
Energy conservation techniques for disk array-based servers
Proceedings of the 18th annual international conference on Supercomputing
The LOCKSS peer-to-peer digital preservation system
ACM Transactions on Computer Systems (TOCS)
Deep Store: An Archival Storage System Architecture
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Hibernator: helping disk arrays sleep through the winter
Proceedings of the twentieth ACM symposium on Operating systems principles
Stardust: tracking activity in a distributed storage system
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
A fresh look at the reliability of long-term digital storage
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
BitVault: a highly reliable distributed data retention platform
ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
A cooperative internet backup scheme
ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
An analysis of latent sector errors in disk drives
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A comparison of file system workloads
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Failure trends in a large disk drive population
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
A five-year study of file-system metadata
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Analysis of Long Term File Reference Patterns for Application to File Migration Algorithms
IEEE Transactions on Software Engineering
POTSHARDS: secure long-term storage without encryption
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Pergamum: replacing tape with energy efficient, reliable, disk-based archival storage
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
An analysis of data corruption in the storage stack
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
A nine year study of file system and storage benchmarking
ACM Transactions on Storage (TOS)
Measurement and analysis of large-scale network file system workloads
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
DataSeries: an efficient, flexible data format for structured serial data
ACM SIGOPS Operating Systems Review
Generating realistic impressions for file-system benchmarking
FAST '09 Proccedings of the 7th conference on File and storage technologies
Capture, conversion, and analysis of an intense NFS workload
FAST '09 Proccedings of the 7th conference on File and storage technologies
Architecture of the internet archive
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
PLFS: a checkpoint filesystem for parallel applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Why traditional storage systems don't help us save stuff forever
HotDep'05 Proceedings of the First conference on Hot topics in system dependability
Usage behavior of a large-scale scientific archive
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Warming up storage-level caches with bonfire
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
The scope of archival systems is expanding beyond cheap tertiary storage: scientific and medical data is increasingly digital, and the public has a growing desire to digitally record their personal histories. Driven by the increase in cost efficiency of hard drives, and the rise of the Internet, content archives have become a means of providing the public with fast, cheap access to long-term data. Unfortunately, designers of purpose-built archival systems are either forced to rely on workload behavior obtained from a narrow, anachronistic view of archives as simply cheap tertiary storage, or extrapolate from marginally related enterprise workload data and traditional library access patterns. To close this knowledge gap and provide relevant input for the design of effective long-term data storage systems, we studied the workload behavior of several systems within this expanded archival storage space. Our study examined several scientific and historical archives, covering a mixture of purposes, media types, and access models---that is, public versus private. Our findings show that, for more traditional private scientific archival storage, files have become larger, but update rates have remained largely unchanged. However, in the public content archives we observed, we saw behavior that diverges from the traditional “write-once, read-maybe” behavior of tertiary storage. Our study shows that the majority of such data is modified---sometimes unnecessarily---relatively frequently, and that indexing services such as Google and internal data management processes may routinely access large portions of an archive, accounting for most of the accesses. Based on these observations, we identify areas for improving the efficiency and performance of archival storage systems.