MobiCom '98 Proceedings of the 4th annual ACM/IEEE international conference on Mobile computing and networking
File system usage in Windows NT 4.0
Proceedings of the seventeenth ACM symposium on Operating systems principles
A protocol-independent technique for eliminating redundant network traffic
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Storage, Mutability and Naming in Pasta
Revised Papers from the NETWORKING 2002 Workshops on Web Engineering and Peer-to-Peer Computing
Rumor: Mobile Data Access Through Optimistic Peer-to-Peer Replication
ER '98 Proceedings of the Workshops on Data Warehousing and Data Mining: Advances in Database Technologies
WWW '03 Proceedings of the 12th international conference on World Wide Web
Xenoservers: Accountable Execution of Untrusted Programs
HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Pastiche: making backup cheap and easy
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Optimizing the migration of virtual computers
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Energy aware lossless data compression
Proceedings of the 1st international conference on Mobile systems, applications and services
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
An analysis of compare-by-hash
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Single instance storage in Windows® 2000
WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
A comparison of file system workloads
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Venti: a new approach to archival storage
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Deep Store: An Archival Storage System Architecture
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Deconstructing Commodity Storage Clusters
Proceedings of the 32nd annual international symposium on Computer Architecture
Improving duplicate elimination in storage systems
ACM Transactions on Storage (TOS)
Exploring patterns of social commonality among file directories at work
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Randomized Protocols for Duplicate Elimination in Peer-to-Peer Storage Systems
IEEE Transactions on Parallel and Distributed Systems
Supporting practical content-addressable caching with CZIP compression
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Implementation and performance evaluation of fuzzy file block matching
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Demystifying data deduplication
Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
IZO: applications of large-window compression to virtual machine management
LISA'08 Proceedings of the 22nd conference on Large installation system administration conference
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
The effectiveness of deduplication on virtual machine disk images
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Multi-level comparison of data deduplication in a backup scenario
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Efficient locally trackable deduplication in replicated systems
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Efficient locally trackable deduplication in replicated systems
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Efficient similarity estimation for systems exploiting data redundancy
INFOCOM'10 Proceedings of the 29th conference on Information communications
HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Bimodal content defined chunking for backup streams
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
A study of practical deduplication
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Tradeoffs in scalable data routing for deduplication clusters
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Fast file existence checking in archiving systems
ACM Transactions on Storage (TOS)
Exploiting similarity for multi-source downloads using file handprints
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Anchor-driven subchunk deduplication
Proceedings of the 4th Annual International Conference on Systems and Storage
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
A study of practical deduplication
ACM Transactions on Storage (TOS)
A two-phase differential synchronization algorithm for remote files
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Characteristics of backup workloads in production systems
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
WAN optimized replication of backup datasets using stream-informed delta compression
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Teleporter: An analytically and forensically sound duplicate transfer system
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Non-linear compression: Gzip Me Not!
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
WAN-optimized replication of backup datasets using stream-informed delta compression
ACM Transactions on Storage (TOS)
Probabilistic deduplication for cluster-based storage systems
Proceedings of the Third ACM Symposium on Cloud Computing
Proceedings of the 8th International Conference on Network and Service Management
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services
Journal of Signal Processing Systems
Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Storage systems frequently maintain identical copies of data. Identifying such data can assist in the design of solutions in which data storage, transmission, and management are optimised. In this paper we evaluate three methods used to discover identical portions of data: whole file content hashing, fixed size blocking, and a chunking strategy that uses Rabin fingerprints to delimit content-defined data chunks. We assess how effective each of these strategies is in finding identical sections of data. In our experiments, we analysed diverse data sets from a variety of different types of storage systems including a mirrored section of sunsite.org.uk, different data profiles in the file system infrastructure of the Cambridge University Computer Laboratory, source code distribution trees, compressed data, and packed files. We report our experimental results and present a comparative analysis of these techniques. This study also shows how levels of similarity differ between data sets and file types. Finally, we discuss the advantages and disadvantages in the application of these methods in the light of our experimental results.