File system aging—increasing the relevance of file system benchmarks
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Windows NT file system internals: a developer's guide
Windows NT file system internals: a developer's guide
A large-scale study of file-system contents
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
File system usage in Windows NT 4.0
Proceedings of the seventeenth ACM symposium on Operating systems principles
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
A study of file sizes and functional lifetimes
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
Proceedings of the twentieth ACM symposium on Operating systems principles
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Single instance storage in Windows® 2000
WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
A five-year study of file-system metadata
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Scalability in the XFS file system
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
BORG: block-reORGanization for self-optimizing storage systems
FAST '09 Proccedings of the 7th conference on File and storage technologies
HYDRAstor: a Scalable Secondary Storage
FAST '09 Proccedings of the 7th conference on File and storage technologies
The effectiveness of deduplication on virtual machine disk images
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Bimodal content defined chunking for backup streams
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Hierarchical file systems are dead
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Decentralized deduplication in SAN cluster file systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Tradeoffs in scalable data routing for deduplication clusters
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Venti: a new approach to archival storage
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
A study on data deduplication in HPC storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
µLibCloud: Providing High Available and Uniform Accessing to Multiple Cloud Storages
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Memorage: emerging persistent RAM based malleable main memory and storage architecture
Proceedings of the 27th international ACM conference on International conference on supercomputing
Rangoli: space management in deduplication environments
Proceedings of the 6th International Systems and Storage Conference
Content-based chunk placement scheme for decentralized deduplication on distributed file systems
ICCSA'13 Proceedings of the 13th international conference on Computational Science and Its Applications - Volume 1
DupLESS: server-aided encryption for deduplicated storage
SEC'13 Proceedings of the 22nd USENIX conference on Security
File recipe compression in data deduplication systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
We collected file system content data from 857 desktop computers at Microsoft over a span of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication, particularly considering whole-file versus block-level elimination of redundancy. We found that whole-file deduplication achieves about three quarters of the space savings of the most aggressive block-level deduplication for storage of live file systems, and 87% of the savings for backup images. We also studied file fragmentation, finding that it is not prevalent, and updated prior file system metadata studies, finding that the distribution of file sizes continues to skew toward very large unstructured files.