Characteristics of files in NFS environments
ACM SIGSMALL/PC Notes
File system aging—increasing the relevance of file system benchmarks
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
File system usage in Windows NT 4.0
Proceedings of the seventeenth ACM symposium on Operating systems principles
A study of file sizes and functional lifetimes
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
The Structural Cause of File Size Distributions
MASCOTS '01 Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
A comparison of file system workloads
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
A five-year study of file-system metadata
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Measurement and analysis of large-scale network file system workloads
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Generating realistic impressions for file-system benchmarking
FAST '09 Proccedings of the 7th conference on File and storage technologies
Energy and performance evaluation of lossless file data compression on server systems
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
The effectiveness of deduplication on virtual machine disk images
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Experiences with content addressable storage and virtual disks
WIOV'08 Proceedings of the First conference on I/O virtualization
Characterizing datasets for data deduplication in backup applications
IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
A study of practical deduplication
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Tradeoffs in scalable data routing for deduplication clusters
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Characteristics of backup workloads in production systems
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Capacity forecasting in a backup storage environment
LISA'11 Proceedings of the 25th international conference on Large Installation System Administration
Hi-index | 0.00 |
Deduplication is a popular component of modern storage systems, with a wide variety of approaches. Unlike traditional storage systems, deduplication performance depends on data content as well as access patterns and meta-data characteristics. Most datasets that have been used to evaluate deduplication systems are either unrepresentative, or unavailable due to privacy issues, preventing easy comparison of competing algorithms. Understanding how both content and meta-data evolve is critical to the realistic evaluation of deduplication systems. We developed a generic model of file system changes based on properties measured on terabytes of real, diverse storage systems. Our model plugs into a generic framework for emulating file system changes. Building on observations from specific environments, the model can generate an initial file system followed by ongoing modifications that emulate the distribution of duplicates and file sizes, realistic changes to existing files, and file system growth. In our experiments we were able to generate a 4TB dataset within 13 hours on a machine with a single disk drive. The relative error of emulated parameters depends on the model size but remains within 15% of real-world observations.