A system for adaptive disk rearrangement
Software—Practice & Experience
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
ACM Transactions on Computer Systems (TOCS)
Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Minimizing Expected Head Movement in One-Dimensional and Two-Dimensional Mass Storage Systems
ACM Computing Surveys (CSUR)
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
My Cache or Yours? Making Storage More Exclusive
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Peabody: The Time Travelling Disk
MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Rules of Thumb in Data Engineering
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Memory resource management in VMware ESX server
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Configuring and Scheduling an Eager-Writing Disk Array for a Transaction Processing Workload
FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
ARC: A Self-Tuning, Low Overhead Replacement Cache
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Passive NFS Tracing of Email and Research Workloads
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Proceedings of the twentieth ACM symposium on Operating systems principles
The automatic improvement of locality in storage systems
ACM Transactions on Computer Systems (TOCS)
CLOCK-Pro: an effective improvement of the CLOCK replacement
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Second-tier cache management using write hints
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Trading capacity for performance in a disk array
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
On multi-level exclusive caching: offline optimality and why promotions are better than demotions
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Fast, inexpensive content-addressed storage in foundation
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Measurement and analysis of large-scale network file system workloads
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
The case for active block layer extensions
ACM SIGOPS Operating Systems Review
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
BORG: block-reORGanization for self-optimizing storage systems
FAST '09 Proccedings of the 7th conference on File and storage technologies
IBM System Storage San Volume Controller
IBM System Storage San Volume Controller
Evaluation techniques for storage hierarchies
IBM Systems Journal
Difference engine: harnessing memory redundancy in virtual machines
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Satori: enlightened page sharing
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Decentralized deduplication in SAN cluster file systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
PRESIDIO: A Framework for Efficient Archival Data Storage
ACM Transactions on Storage (TOS)
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Sprint: speculative prefetching of remote data
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Small is big: functionally partitioned file caching in virtualized environments
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Reducing impact of data fragmentation caused by in-line deduplication
Proceedings of the 5th Annual International Systems and Storage Conference
Future Generation Computer Systems
Block locality caching for data deduplication
Proceedings of the 6th International Systems and Storage Conference
Hi-index | 0.00 |
Duplication of data in storage systems is becoming increasingly common. We introduce I/O Deduplication, a storage optimization that utilizes content similarity for improving I/O performance by eliminating I/O operations and reducing the mechanical delays during I/O operations. I/O Deduplication consists of three main techniques: content-based caching, dynamic replica retrieval, and selective duplication. Each of these techniques is motivated by our observations with I/O workload traces obtained from actively-used production storage systems, all of which revealed surprisingly high levels of content similarity for both stored and accessed data. Evaluation of a prototype implementation using these workloads showed an overall improvement in disk I/O performance of 28 to 47% across these workloads. Further breakdown also showed that each of the three techniques contributed significantly to the overall performance improvement.