ACM Transactions on Computer Systems (TOCS)
A study of integrated prefetching and caching strategies
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
The automatic improvement of locality in storage systems
ACM Transactions on Computer Systems (TOCS)
Trading capacity for performance in a disk array
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
File system design for an NFS file server appliance
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Predicting file system actions from prior events
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Embedded inodes and explicit grouping: exploiting disk bandwidth for small files
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
Cumulus: filesystem backup to the cloud
FAST '09 Proccedings of the 7th conference on File and storage technologies
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Tracking back references in a write-anywhere file system
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
HPCC '11 Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications
Characteristics of backup workloads in production systems
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
iDedup: latency-aware, inline data deduplication for primary storage
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Reducing impact of data fragmentation caused by in-line deduplication
Proceedings of the 5th Annual International Systems and Storage Conference
Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets
MASCOTS '12 Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud
ACM Transactions on Storage (TOS)
Hi-index | 0.00 |
Slow restoration due to chunk fragmentation is a serious problem facing inline chunk-based data deduplication systems: restore speeds for the most recent backup can drop orders of magnitude over the lifetime of a system. We study three techniques--increasing cache size, container capping, and using a forward assembly area-- for alleviating this problem. Container capping is an ingest-time operation that reduces chunk fragmentation at the cost of forfeiting some deduplication, while using a forward assembly area is a new restore-time caching and prefetching technique that exploits the perfect knowledge of future chunk accesses available when restoring a backup to reduce the amount of RAM required for a given level of caching at restore time. We show that using a larger cache per stream--we see continuing benefits even up to 8 GB--can produce up to a 5-16X improvement, that giving up as little as 8% deduplication with capping can yield a 2-6X improvement, and that using a forward assembly area is strictly superior to LRU, able to yield a 2-4X improvement while holding the RAM budget constant.