ACM Computing Surveys (CSUR)
Characteristics of files in NFS environments
SIGSMALL '91 Proceedings of the 1991 ACM SIGSMALL/PC symposium on Small systems
Measurements of a distributed file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A large-scale study of file-system contents
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A trace-driven analysis of the UNIX 4.2 BSD file system
Proceedings of the tenth ACM symposium on Operating systems principles
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
A study of file sizes and functional lifetimes
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
Characteristics of I/O traffic in personal computer and server workloads
IBM Systems Journal
Improving duplicate elimination in storage systems
ACM Transactions on Storage (TOS)
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Single instance storage in Windows® 2000
WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
A comparison of file system workloads
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
A five-year study of file-system metadata
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Measurement and analysis of large-scale network file system workloads
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
HYDRAstor: a Scalable Secondary Storage
FAST '09 Proccedings of the 7th conference on File and storage technologies
Bimodal content defined chunking for backup streams
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Characterizing datasets for data deduplication in backup applications
IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
A study of practical deduplication
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Tradeoffs in scalable data routing for deduplication clusters
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Venti: a new approach to archival storage
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Building a high-performance deduplication system
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
HPCC '11 Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications
WAN optimized replication of backup datasets using stream-informed delta compression
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
SFS: random write considered harmful in solid state drives
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Capacity forecasting in a backup storage environment
LISA'11 Proceedings of the 25th international conference on Large Installation System Administration
WAN optimized replication of backup datasets using stream-informed delta compression
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Power consumption in enterprise-scale backup storage systems
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Generating realistic datasets for deduplication analysis
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Insights for data reduction in primary storage: a practical analysis
Proceedings of the 5th Annual International Systems and Storage Conference
WAN-optimized replication of backup datasets using stream-informed delta compression
ACM Transactions on Storage (TOS)
A study on data deduplication in HPC storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Probabilistic deduplication for cluster-based storage systems
Proceedings of the Third ACM Symposium on Cloud Computing
Reducing Storage Overhead with Small Write Bottleneck Avoiding in Cloud RAID System
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
A scalable inline cluster deduplication framework for big data protection
Proceedings of the 13th International Middleware Conference
Block locality caching for data deduplication
Proceedings of the 6th International Systems and Storage Conference
Building intelligence for software defined data centers: modeling usage patterns
Proceedings of the 6th International Systems and Storage Conference
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
ROOT: replaying multithreaded traces with resource-oriented ordering
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
DupLESS: server-aided encryption for deduplicated storage
SEC'13 Proceedings of the 22nd USENIX conference on Security
Efficiently storing virtual machine backups
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Characterization of incremental data changes for efficient data protection
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Memory efficient sanitization of a deduplicated storage system
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
File recipe compression in data deduplication systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Improving restore speed for backup systems that use inline chunk-based deduplication
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
(Big)data in a virtualized world: volume, velocity, and variety in cloud datacenters
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Migratory compression: coarse-grained data reordering to improve compressibility
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Data-protection class workloads, including backup and long-term retention of data, have seen a strong industry shift from tape-based platforms to disk-based systems. But the latter are traditionally designed to serve as primary storage and there has been little published analysis of the characteristics of backup workloads as they relate to the design of disk-based systems. In this paper, we present a comprehensive characterization of backup workloads by analyzing statistics and content metadata collected from a large set of EMC Data Domain backup systems in production use. This analysis is both broad (encompassing statistics from over 10,000 systems) and deep (using detailed metadata traces from several production systems storing almost 700TB of backup data). We compare these systems to a detailed study of Microsoft primary storage systems [22], showing that backup storage differs significantly from their primary storage workload in the amount of data churn and capacity requirements as well as the amount of redundancy within the data. These properties bring unique challenges and opportunities when designing a disk-based filesystem for backup workloads, which we explore in more detail using the metadata traces. In particular, the need to handle high churn while leveraging high data redundancy is considered by looking at deduplication unit size and caching efficiency.