File system aging—increasing the relevance of file system benchmarks
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Windows NT file system internals: a developer's guide
Windows NT file system internals: a developer's guide
A large-scale study of file-system contents
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
File system usage in Windows NT 4.0
Proceedings of the seventeenth ACM symposium on Operating systems principles
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
A study of file sizes and functional lifetimes
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
Proceedings of the twentieth ACM symposium on Operating systems principles
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Single instance storage in Windows® 2000
WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
A five-year study of file-system metadata
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Scalability in the XFS file system
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
BORG: block-reORGanization for self-optimizing storage systems
FAST '09 Proccedings of the 7th conference on File and storage technologies
HYDRAstor: a Scalable Secondary Storage
FAST '09 Proccedings of the 7th conference on File and storage technologies
The effectiveness of deduplication on virtual machine disk images
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Bimodal content defined chunking for backup streams
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Hierarchical file systems are dead
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Decentralized deduplication in SAN cluster file systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Tradeoffs in scalable data routing for deduplication clusters
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Tradeoffs in scalable data routing for deduplication clusters
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Design implications for enterprise storage systems via multi-dimensional trace analysis
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Differentiated storage services
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
An empirical analysis of similarity in virtual machine images
Proceedings of the Middleware 2011 Industry Track Workshop
GHOST: GPGPU-offloaded high performance storage I/O deduplication for primary storage system
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Robust benchmarking for archival storage tiers
Proceedings of the sixth workshop on Parallel Data Storage
File routing middleware for cloud deduplication
Proceedings of the 2nd International Workshop on Cloud Computing Platforms
Characteristics of backup workloads in production systems
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Recon: verifying file system consistency at runtime
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
iDedup: latency-aware, inline data deduplication for primary storage
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Content-aware load balancing for distributed backup
LISA'11 Proceedings of the 25th international conference on Large Installation System Administration
Efficient storage of virtual machine images
Proceedings of the 3rd workshop on Scientific Cloud Computing Date
Generating realistic datasets for deduplication analysis
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Primary data deduplication-large scale study and system design
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Insights for data reduction in primary storage: a practical analysis
Proceedings of the 5th Annual International Systems and Storage Conference
GANGRENE: exploring the mortality of flash memory
HotSec'12 Proceedings of the 7th USENIX conference on Hot Topics in Security
Benchmarking and modeling disk-based storage tiers for practical storage design
ACM SIGMETRICS Performance Evaluation Review
Recon: Verifying file system consistency at runtime
ACM Transactions on Storage (TOS)
Probabilistic deduplication for cluster-based storage systems
Proceedings of the Third ACM Symposium on Cloud Computing
Direct lookup and hash-based metadata placement for local file systems
Proceedings of the 6th International Systems and Storage Conference
Proceedings of the 8th International Conference on Network and Service Management
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services
Journal of Signal Processing Systems
Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud
ACM Transactions on Storage (TOS)
DEDIS: distributed exact deduplication for primary storage infrastructures
Proceedings of the 4th annual Symposium on Cloud Computing
TABLEFS: enhancing metadata efficiency in the local file system
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
SBBS: A sliding blocking algorithm with backtracking sub-blocks for duplicate data detection
Expert Systems with Applications: An International Journal
Virtual machine workloads: the case for new benchmarks for NAS
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
(Big)data in a virtualized world: volume, velocity, and variety in cloud datacenters
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Checking the integrity of transactional mechanisms
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
We collected file system content data from 857 desktop computers at Microsoft over a span of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication, particularly considering whole-file versus block-level elimination of redundancy. We found that whole-file deduplication achieves about three quarters of the space savings of the most aggressive block-level deduplication for storage of live file systems, and 87% of the savings for backup images. We also studied file fragmentation finding that it is not prevalent, and updated prior file system metadata studies, finding that the distribution of file sizes continues to skew toward very large unstructured files.