Fast parallel and serial approximate string matching
Journal of Algorithms
Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Building a scalable and accurate copy detection mechanism
Proceedings of the first ACM international conference on Digital libraries
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Min-Wise versus linear independence (extended abstract)
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A fast string searching algorithm
Communications of the ACM
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
Finding Near-Replicas of Documents and Servers on the Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Faster algorithms for string matching with k mismatches
Journal of Algorithms - Special issue: SODA 2000
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Improving duplicate elimination in storage systems
ACM Transactions on Storage (TOS)
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Model checking electronic commerce protocols
WOEC'96 Proceedings of the 2nd conference on Proceedings of the Second USENIX Workshop on Electronic Commerce - Volume 2
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
Efficient locally trackable deduplication in replicated systems
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Efficient locally trackable deduplication in replicated systems
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Anchor-driven subchunk deduplication
Proceedings of the 4th Annual International Conference on Systems and Storage
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
WAN optimized replication of backup datasets using stream-informed delta compression
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Delta compressed and deduplicated storage using stream-informed locality
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Reducing impact of data fragmentation caused by in-line deduplication
Proceedings of the 5th Annual International Systems and Storage Conference
WAN-optimized replication of backup datasets using stream-informed delta compression
ACM Transactions on Storage (TOS)
A study on data deduplication in HPC storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the 6th International Systems and Storage Conference
Dynamic Synchronous/Asynchronous Replication
ACM Transactions on Storage (TOS)
Improving deduplication techniques by accelerating remainder calculations
Discrete Applied Mathematics
Migratory compression: coarse-grained data reordering to improve compressibility
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
We describe some of the design choices that were made during the development of a fast, scalable, inline, deduplication device. The system's design goals and how they were achieved are presented. This is the firs deduplication device that uses similarity matching. The paper provides the following original research contributions: we show how similarity signatures can serve in a deduplication scheme; a novel type of similarity signatures is presented and its advantages in the context of deduplication requirements are explained. It is also shown how to combine similarity matching schemes with byte by byte comparison or hash based identity schemes.